Tech Lead Manager- Mlre, Ml Systems

Scaleai

San Francisco, CA, USA
$264,800 - $331,000 usd; equity + benefits include...
On-site
Multi-node llm training and inference
Large-scale distributed ml systems
Post-training methods like rlhf/rlvr
Scale's LLM post-training platform team builds our internal distributed framework for large language model training

Job Summary

  • Scale's LLM post-training platform team builds our internal distributed framework for large language model training.
  • You will work closely with Scale’s ML teams and researchers to build the foundation platform which supports all our ML research and development works.
  • Compensation packages at Scale for eligible roles include base salary, equity, and benefits.

Matching Summary

Scale's LLM post-training platform team builds our internal distributed framework for large language model training.

Salary

$264,800 - $331,000 USD; Equity and benefits included; Comprehensive health, dental and vision coverage, retirement benefits, learning and development stipend, generous PTO, commuter stipend.

Skills & Requirements

Must-have

  • multi-node LLM training and inference
  • large-scale distributed ML systems
  • post-training methods like RLHF/RLVR
  • CUDA, Pytorch, transformers, flash attention
  • system optimization

Nice-to-have

  • instruction tuning, RLHF, tool use, reasoning, agents, multimodal
  • cross functional team environment

Key Requirements

  • Experience with multi-node LLM training and inference
  • Experience with developing large-scale distributed ML systems
  • Strong software engineering skills
  • Strong written and verbal communication skills

Work Rights

Not specified

Tailored Resume

Cover Letter