Synthetic Data Generation And User Simulation Phd Research Intern — Fall 2026

Nvidia Corporation

Hourly rate: $30 - $94 usd; benefits: eligible for...
Phd in computer science or machine learning
Deep learning framework experience (pytorch)
Generative modeling and synthetic data generation
This role focuses on investigating how generative models can create high-utility instructional and assessment data for next-generation AI models

Job Summary

  • This role focuses on investigating how generative models can create high-utility instructional and assessment data for next-generation AI models.
  • The team explores population-grounded user simulation calibrated against real behavioral signatures to yield training signals for SFT and RL environments.
  • Candidates will validate that their synthetic data measurably improves downstream model performance across accuracy, robustness, and safety metrics.

Matching Summary

This role focuses on investigating how generative models can create high-utility instructional and assessment data for next-generation AI models.

Salary

Hourly rate: $30 - $94 USD; Benefits: Eligible for intern benefits; Location dependent

Skills & Requirements

Must-have

  • PhD in Computer Science or Machine Learning
  • Deep learning framework experience (PyTorch)
  • Generative modeling and synthetic data generation
  • LLM post-training techniques (SFT/RLHF/DPO)
  • Python programming proficiency

Nice-to-have

  • End-to-end LLM training and evaluation experience
  • User simulation and behavioral modeling background
  • Multilingual and low-resource AI research
  • Open-source contributions to SDG or LLM tools
  • Publications at top-tier AI conferences

Key Requirements

  • Pursuing a PhD in CS, ML, Computational Linguistics, or equivalent
  • Strong research background with publications at top-tier conferences
  • Hands-on experience with HuggingFace, vLLM, and distributed training stacks

Work Rights

Not specified

Tailored Resume

Cover Letter