Agent Rl Infra Engineer

NVIDIA

Us, CA, United States
Base: 224,000 usd - 356,500 usd; bonus/equity: eli...
Reinforcement learning techniques operationalization
Distributed training frameworks familiarity
Ml ops pipeline automation
This role offers a rare chance to shape how autonomous, self-improving agents learn and evolve across the enterprise

Job Summary

  • This role offers a rare chance to shape how autonomous, self-improving agents learn and evolve across the enterprise.
  • The position involves creating enterprise-ready reinforcement learning capabilities and partnering with agent teams to implement them in practice.
  • NVIDIA is committed to fostering a diverse work environment and is proud to be an equal opportunity employer.

Matching Summary

This role offers a rare chance to shape how autonomous, self-improving agents learn and evolve across the enterprise.

Salary

Base: 224,000 USD - 356,500 USD; Bonus/Equity: Eligible for equity; Benefits: Eligible for benefits

Skills & Requirements

Must-have

  • Reinforcement learning techniques operationalization
  • Distributed training frameworks familiarity
  • ML ops pipeline automation
  • GPU cluster management
  • Python, Go, or Rust proficiency
  • Enterprise-ready RL capability development

Nice-to-have

  • Building RL environments and training recipes
  • Experience with NVIDIA infrastructure
  • Data curation and active learning strategies
  • Continuous learning loops and data flywheel architectures
  • Collaboration with platform and security teams

Key Requirements

  • MS in CS, ML, or related field or equivalent experience
  • 10+ years of experience
  • Experience with fine-tuning methods including LoRA and SFT
  • Experience with RL techniques such as DPO, GRPO, PPO, RLAIF
  • Familiarity with distributed training frameworks like Megatron, NeMo, DeepSpeed
  • ML ops skills including job orchestration and GPU cluster management

Work Rights

Not specified

Tailored Resume

Cover Letter