Lead large-scale pretraining experiments for multimodal (image, video, audio) foundation models, shaping training objectives, architectures, data strategies, and systems
Job Summary
Lead large-scale pretraining experiments for multimodal (image, video, audio) foundation models, shaping training objectives, architectures, data strategies, and systems.
Develop and evaluate novel ideas across architecture, optimizers, and training algorithms, contributing across the full stack from low-level optimizations to high-level model design.
This is a Staff / Senior IC role for someone who has led pretraining at the frontier and wants to do it again, with a direct line from research to products used by millions.
Matching Summary
Lead large-scale pretraining experiments for multimodal (image, video, audio) foundation models, shaping training objectives, architectures, data strategies, and systems.
Skills & Requirements
Must-have
Large-scale pretraining experiments
Multimodal foundation models
Distributed training (FSDP/TP/PP)
Deep Python and PyTorch proficiency
Visual generative models familiarity
Nice-to-have
Research excellence and open science
Expanding human creativity
Low-level systems optimizations
Key Requirements
Led pretraining for a foundation model shipped to production
Experience with 500+ GPU multi-node runs
Top-venue publications or production wins
Comfortable reading and modifying low-level training code