Research Engineer, Rl Infrastructure And Reliability (knowledge Work)

Anthropic

San Francisco, CA, United States
Base: $350,000 - $850,000 usd annually; bonus/equi...
On-site
Highly experienced python engineer
Operating ml or distributed systems at scale
Sre mindset with slos and load tests
Anthropic's mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for society

Job Summary

  • Anthropic's mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for society.
  • The role focuses on shifting reliability work from reactive to proactive by hardening systems and stress-testing at realistic scale.
  • Researchers can stay focused on their core work while this engineer ensures training and evaluation runs remain stable and well-instrumented.

Matching Summary

Anthropic's mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for society.

Salary

Base: $350,000 - $850,000 USD annually; Bonus/Equity: Not specified; Benefits: Not specified

Skills & Requirements

Must-have

  • Highly experienced Python engineer
  • Operating ML or distributed systems at scale
  • SRE mindset with SLOs and load tests
  • Foundational ML knowledge for evaluation integrity
  • Ability to read research code

Nice-to-have

  • Experience building RL environments or agent harnesses
  • Familiarity with reward modeling and detection of hacking
  • Background in chaos engineering and fault injection
  • Experience with data quality pipelines and drift detection
  • Prior experience as reliability owner embedded in research

Key Requirements

  • 5+ years operating ML or distributed systems at scale (Preferred)
  • Bachelor's degree or equivalent experience
  • Years of experience correlating with internal job level

Work Rights

Not specified

Sponsorship: available

Tailored Resume

Cover Letter