Research Scientist, Safety Post Training

Scale Labs

San Francisco, CA, US
Base: $216,000 - $270,000 usd; equity: included su...
On-site
Experience with rlhf dpo grpo post-training techniques
Track record of published research in generative ai
At least three years of experience addressing sophisticated ml problems
The role involves developing post-training methods and interpretability techniques to make frontier AI systems safer and better understood

Job Summary

  • The role involves developing post-training methods and interpretability techniques to make frontier AI systems safer and better understood.
  • Candidates will collaborate with policymakers, engineers, and researchers to translate findings into actionable safety standards and benchmarks.
  • The compensation package includes a base salary range of $216,000 to $270,000 USD along with equity and comprehensive benefits.

Matching Summary

The role involves developing post-training methods and interpretability techniques to make frontier AI systems safer and better understood.

Salary

Base: $216,000 - $270,000 USD; Equity: Included subject to Board approval; Benefits: Comprehensive health, dental, vision, retirement, stipend, PTO

Skills & Requirements

Must-have

  • Experience with RLHF DPO GRPO post-training techniques
  • Track record of published research in generative AI
  • At least three years of experience addressing sophisticated ML problems

Nice-to-have

  • Experience with mechanistic interpretability and probing techniques
  • Familiarity with red-teaming or adversarial evaluation methods
  • Experience studying failure modes like reward hacking or alignment faking

Key Requirements

  • Three years of experience in sophisticated ML problems
  • Published research track record in machine learning
  • Strong written and verbal communication skills

Work Rights

Not specified

Tailored Resume

Cover Letter