Research Scientist, Safety Post Training

Scale Labs

San Francisco, CA, US
Base: $216,000 - $270,000 usd; equity: included su...
On-site
Experience with post-training techniques
Knowledge of rlhf dpo grpo methods
Published research in generative ai
The role focuses on developing post-training methods and interpretability techniques to make frontier AI systems safer

Job Summary

  • The role focuses on developing post-training methods and interpretability techniques to make frontier AI systems safer.
  • Candidates will collaborate with policymakers and engineers to translate findings into actionable safety standards and benchmarks.
  • The compensation package includes a base salary ranging from $216,000 to $270,000 USD along with equity and comprehensive benefits.

Matching Summary

The role focuses on developing post-training methods and interpretability techniques to make frontier AI systems safer.

Salary

Base: $216,000 - $270,000 USD; Equity: Included subject to Board approval; Benefits: Health, dental, vision, retirement, PTO

Skills & Requirements

Must-have

  • Experience with post-training techniques
  • Knowledge of RLHF DPO GRPO methods
  • Published research in generative AI
  • Three years experience in ML problems

Nice-to-have

  • Mechanistic interpretability experience
  • Red-teaming or adversarial evaluation skills
  • Understanding of reward hacking failure modes

Key Requirements

  • At least three years of sophisticated ML experience
  • Track record of published machine learning research
  • Strong written and verbal communication skills

Work Rights

Not specified

Tailored Resume

Cover Letter