Own the red-teaming and adversarial evaluation pipeline for Reflection’s models, continuously probing for failure modes across security, misuse, and alignment gaps
Job Summary
Own the red-teaming and adversarial evaluation pipeline for Reflection’s models, continuously probing for failure modes across security, misuse, and alignment gaps.
Validate that every release meets the lab’s risk thresholds before it ships, serving as a critical gatekeeper for our open weight releases.
We want you to do the most impactful work of your career with the confidence that you and the people you care about most are supported.
Matching Summary
Own the red-teaming and adversarial evaluation pipeline for Reflection’s models, continuously probing for failure modes across security, misuse, and alignment gaps.
Skills & Requirements
Must-have
Red-teaming and adversarial evaluation
Translate safety findings into guardrails
Validate release risk thresholds
Develop scalable automated safety benchmarks
Research state-of-the-art jailbreaking techniques
Nice-to-have
Advance the frontier of intelligence
High-agency startup environment
Bias toward action
Key Requirements
Graduate degree in Computer Science, Machine Learning, or related discipline, or equivalent practical experience in AI Safety
Deep technical understanding of LLM safety
Strong software engineering capabilities
Experience building automated evaluation pipelines or large-scale ML systems