Machine Learning Systems Research Engineer, Agent Post-training - Enterprise Genai

Scaleai

San Francisco, CA, USA
$189,600 - $237,000 usd py
On-site
Llm training production environment
Post-training methods rlhf/rlvr
Gpu cluster architecture operation
Scale's mission is to accelerate the development of AI applications by being the leading AI data foundry

Job Summary

  • Scale's mission is to accelerate the development of AI applications by being the leading AI data foundry.
  • As an ML Sys Research Engineer, you’ll work on building out the algorithms for our next-gen Agent RL training platform, support large scale training, and research and integrate state-of-the-art technologies to optimize our ML system.
  • Compensation packages at Scale for eligible roles include base salary, equity, and benefits, with a base salary range of $189,600 - $237,000 USD for this position in San Francisco and New York.

Matching Summary

Scale's mission is to accelerate the development of AI applications by being the leading AI data foundry.

Salary

$189,600 - $237,000 USD

Skills & Requirements

Must-have

  • LLM training production environment
  • post-training methods RLHF/RLVR
  • GPU cluster architecture operation
  • multi-node LLM training inference
  • CUDA, Pytorch, transformers, flash attention

Nice-to-have

  • system optimization passion
  • strong software engineering skills
  • cross functional team communication

Key Requirements

  • 1-3 years LLM training experience
  • Experience with PPO/GRPO algorithms
  • PhD or Masters in Computer Science

Work Rights

Not specified

Tailored Resume

Cover Letter