Ml Research Engineer, Ml Systems

Scale

San Francisco, CA, US
Base: $189,600 - $237,000 usd; equity: included ba...
On-site
Multi-node llm training experience
Large-scale distributed ml systems
Proficiency in cuda and pytorch
The role involves building and optimizing the internal distributed framework for large language model training and inference

Job Summary

  • The role involves building and optimizing the internal distributed framework for large language model training and inference.
  • Candidates will collaborate with ML teams to accelerate research and enable the development of next-generation models.
  • Compensation includes base salary, equity, and comprehensive benefits such as health coverage and a learning stipend.

Matching Summary

The role involves building and optimizing the internal distributed framework for large language model training and inference.

Salary

Base: $189,600 - $237,000 USD; Equity: Included based on Board approval; Benefits: Comprehensive health, dental, vision, retirement, PTO, and commuter stipend

Skills & Requirements

Must-have

  • Multi-node LLM training experience
  • Large-scale distributed ML systems
  • Proficiency in CUDA and PyTorch

Nice-to-have

  • Experience with RLHF and instruction tuning
  • Knowledge of multimodal models and agents
  • Strong cross-functional communication skills

Key Requirements

  • Strong software engineering skills
  • Experience with Flash Attention and Transformers
  • Ability to operate in a cross functional team environment

Work Rights

Not specified

Tailored Resume

Cover Letter