Evals Engineer, Applied Ai

Scale AI

San Francisco, CA, USA
Base: $179,400—$224,250 usd; equity: subject to bo...
On-site
Large language models
Genai evaluation suite
Llm-as-a-judge autorater
This high-impact role is critical to our mission of delivering the industry's leading GenAI Evaluation Suite

Job Summary

  • This high-impact role is critical to our mission of delivering the industry's leading GenAI Evaluation Suite.
  • Partner with Scale’s Operations team and enterprise customers to translate ambiguity into structured evaluation data, guiding the creation and maintenance of gold-standard human-rated datasets and expert rubrics that anchor AI evaluation systems.
  • Compensation packages at Scale for eligible roles include base salary, equity, and benefits.

Matching Summary

This high-impact role is critical to our mission of delivering the industry's leading GenAI Evaluation Suite.

Salary

Base: $179,400—$224,250 USD; Equity: subject to Board of Director approval; Benefits: Comprehensive health, dental and vision coverage, retirement benefits, learning and development stipend, generous PTO, commuter stipend

Skills & Requirements

Must-have

  • Large Language Models
  • GenAI Evaluation Suite
  • LLM-as-a-Judge autorater
  • human-rated datasets
  • expert rubrics
  • Python and major ML frameworks

Nice-to-have

  • latest literature in AI evaluation
  • integrating novel research ideas
  • dynamic, fast-paced research environment
  • ML research engineering
  • stochastic systems
  • observability

Key Requirements

  • 2+ years of experience in Machine Learning or Applied Research
  • Bachelor’s degree in Computer Science, Electrical Engineering, or related field
  • Hands-on experience with Large Language Models (LLMs) and Generative AI
  • Strong understanding of frontier model evaluation methodologies

Work Rights

Not specified

Tailored Resume

Cover Letter