Research Scientist, Safety Post Training

Scale Labs

San Francisco, CA, US
Base: $216,000 - $270,000 usd; equity: included ba...
On-site
Experience with rlhf dpo grpo techniques
Track record of published ml research
Three years experience in sophisticated ml problems
The role focuses on developing post-training pipelines to study how training choices affect model safety and alignment properties

Job Summary

  • The role focuses on developing post-training pipelines to study how training choices affect model safety and alignment properties.
  • Candidates will collaborate with policymakers and engineers to translate findings into actionable safety standards and evaluation benchmarks.
  • The compensation package includes a base salary range of $216,000 to $270,000 USD along with equity and comprehensive benefits.

Matching Summary

The role focuses on developing post-training pipelines to study how training choices affect model safety and alignment properties.

Salary

Base: $216,000 - $270,000 USD; Equity: Included based on Board approval; Benefits: Comprehensive health, dental, vision, retirement, stipend, PTO

Skills & Requirements

Must-have

  • Experience with RLHF DPO GRPO techniques
  • Track record of published ML research
  • Three years experience in sophisticated ML problems

Nice-to-have

  • Experience with mechanistic interpretability
  • Familiarity with red-teaming adversarial evaluation
  • Understanding of reward hacking and sycophancy

Key Requirements

  • At least three years of ML experience
  • Published research in machine learning or generative AI
  • Strong written and verbal communication skills

Work Rights

Not specified

Tailored Resume

Cover Letter