Distinguished Engineer, Gpu Fleet Operations Automation

Nvidia Corporation

Base: 320,000 usd - 488,750 usd; bonus/equity: eli...
Not specified (assumed hybrid based on industry norms).
Dgx cloud strategy
Gpu fleet lifecycle management
Auto-remediation strategies
NVIDIA is seeking a Distinguished Engineer for GPU Fleet Operations Automation, focusing on the development of DGX Cloud strategy for GPU lifecycle management. The role emphasizes technical strategy, collaboration across teams, and the delivery of high-impact solutions in cloud infrastructure and automation

Job Summary

  • You will lead the development of DGX Cloud strategy for GPU fleet lifecycle, health, observability and utilization monitoring, and remediation.
  • You will define and drive the technical strategy across multiple environments (bare metal, cloud service provider, and neoclouds).
  • You will work with NVIDIA leadership cross-organizationally and cross-functionally to deliver accelerated computing infrastructure that enables customers with the highest availability and operational standards.

Matching Summary

Match Score: 85

NVIDIA is seeking a Distinguished Engineer for GPU Fleet Operations Automation, focusing on the development of DGX Cloud strategy for GPU lifecycle management. The role emphasizes technical strategy, collaboration across teams, and the delivery of high-impact solutions in cloud infrastructure and automation.

Salary

Base: 320,000 USD - 488,750 USD; Bonus/Equity: Eligible for equity; Benefits: Eligible for benefits

Skills & Requirements

Must-have

  • DGX Cloud strategy
  • GPU fleet lifecycle management
  • auto-remediation strategies
  • multi-tenant data center architectures
  • cloud-native architectures
  • AI/ML platforms and applications

Nice-to-have

  • AI for issue identification
  • highly available scaled out systems
  • scalable processes and extensible systems
  • open source ecosystem collaboration

Key Requirements

  • 15-18+ overall years in technical roles
  • 5-10+ years of lead experience
  • BS/MS or higher or equivalent experience
  • Technical proficiency in multi-tenant data center and cloud-native architectures
  • bare metal, virtualization, containerization, IaaS, Kubernetes, Slurm
  • AI/ML platforms and applications

Work Rights

Not specified

Tailored Resume

Cover Letter