Base: $35-$45 usd annual; equity: not specified; b...
On-site
Ph.d. or master's in computer science
Deep understanding of frontier multimodal models
Proficiency in python and pytorch/jax/tensorflow
You will design and build evaluation suites for reasoning, code, agents, and vision-language models while creating post-training datasets at scale
Job Summary
You will design and build evaluation suites for reasoning, code, agents, and vision-language models while creating post-training datasets at scale.
The role offers the opportunity to prototype RLHF/RLAIF training loops and land research directly into customer-facing product features.
Labelbox operates like an early-stage startup where you will take on expanded responsibilities quickly with clear ownership and career growth tied to your impact.
Matching Summary
You will design and build evaluation suites for reasoning, code, agents, and vision-language models while creating post-training datasets at scale.
Salary
Base: $35-$45 USD annual; Equity: Not specified; Benefits: Not specified
Skills & Requirements
Must-have
Ph.D. or Master's in Computer Science
Deep understanding of frontier multimodal models
Proficiency in Python and PyTorch/JAX/TensorFlow
Experience with LLM evaluation and benchmarking
Track record of publishing in top-tier AI conferences
Nice-to-have
Passion for bridging research and application
Exceptional communication and collaboration skills
Ability to work in a high-impact startup environment
Experience with human-AI interaction techniques
Key Requirements
Ph.D. or Master's degree in progress acceptable
Publication record in NeurIPS, ICML, ICLR, ACL, EMNLP, or NAACL
Expertise in training data quality construction and refinement