Not specified (assumed hybrid based on industry norms).
Dgx cloud strategy
Gpu fleet lifecycle management
Auto-remediation strategies
NVIDIA is seeking a Distinguished Engineer for GPU Fleet Operations Automation, focusing on the development of DGX Cloud strategy for GPU lifecycle management. The role emphasizes technical strategy, collaboration across teams, and the delivery of high-impact solutions in cloud infrastructure and automation
Job Summary
You will lead the development of DGX Cloud strategy for GPU fleet lifecycle, health, observability and utilization monitoring, and remediation.
You will define and drive the technical strategy across multiple environments (bare metal, cloud service provider, and neoclouds).
You will work with NVIDIA leadership cross-organizationally and cross-functionally to deliver accelerated computing infrastructure that enables customers with the highest availability and operational standards.
Matching Summary
Match Score: 85
NVIDIA is seeking a Distinguished Engineer for GPU Fleet Operations Automation, focusing on the development of DGX Cloud strategy for GPU lifecycle management. The role emphasizes technical strategy, collaboration across teams, and the delivery of high-impact solutions in cloud infrastructure and automation.
Salary
Base: 320,000 USD - 488,750 USD; Bonus/Equity: Eligible for equity; Benefits: Eligible for benefits
Skills & Requirements
Must-have
DGX Cloud strategy
GPU fleet lifecycle management
auto-remediation strategies
multi-tenant data center architectures
cloud-native architectures
AI/ML platforms and applications
Nice-to-have
AI for issue identification
highly available scaled out systems
scalable processes and extensible systems
open source ecosystem collaboration
Key Requirements
15-18+ overall years in technical roles
5-10+ years of lead experience
BS/MS or higher or equivalent experience
Technical proficiency in multi-tenant data center and cloud-native architectures
bare metal, virtualization, containerization, IaaS, Kubernetes, Slurm