Multimodal Generative Ai Researcher

Stabilityai

Remote
Remote
Training and fine-tuning large vlms/llms
Multimodal alignment and representation learning
Vision, language, and 3d reasoning
Design and fine-tune large-scale VLMs/LLMs for tasks such as visual reasoning, retrieval, 3D understanding, and embodied interaction

Job Summary

  • Design and fine-tune large-scale VLMs/LLMs for tasks such as visual reasoning, retrieval, 3D understanding, and embodied interaction.
  • Build robust, efficient training and evaluation pipelines, including data curation, distributed training, and scalable fine-tuning.
  • Collaborate across research, engineering, and 3D/graphics teams to bring models from prototype to production.

Matching Summary

Design and fine-tune large-scale VLMs/LLMs for tasks such as visual reasoning, retrieval, 3D understanding, and embodied interaction.

Skills & Requirements

Must-have

  • training and fine-tuning large VLMs/LLMs
  • multimodal alignment and representation learning
  • vision, language, and 3D reasoning
  • PyTorch, DeepSpeed, Ray experience
  • robust, efficient training pipelines

Nice-to-have

  • integrating 3D and graphics pipelines
  • vision-language-action models
  • efficient adaptation methods
  • video and 4D generation trends

Key Requirements

  • PhD or equivalent experience
  • proven track record fine-tuning large VLMs/LLMs
  • strong engineering mindset
  • familiarity with recent trends
  • awareness of 3D-aware multimodal models

Work Rights

Not specified

Tailored Resume

Cover Letter