Cluster Deployment Operations Engineer - Nvis

NVIDIA

Base: 224,000 usd - 356,500 usd (level 5); 272,000...
Hpc/large-scale cluster administration
Linux systems engineering
Infrastructure automation
Playing an integral role in NVIDIA’s New Product Introduction (NPI) team, acting as the link between engineering and the NVIS field team for cluster deployment and management solutions

Job Summary

  • Playing an integral role in NVIDIA’s New Product Introduction (NPI) team, acting as the link between engineering and the NVIS field team for cluster deployment and management solutions.
  • Collaborating closely with engineering and product teams to review and influence design decisions for products centered around large-scale AI Factory deployments.
  • Supporting NVIDIA's mission by ensuring our breakthrough technologies are successfully deployed for global customers by both NVIDIA and our OEM partners.

Matching Summary

Playing an integral role in NVIDIA’s New Product Introduction (NPI) team, acting as the link between engineering and the NVIS field team for cluster deployment and management solutions.

Salary

Base: 224,000 USD - 356,500 USD (Level 5); 272,000 USD - 431,250 USD (Level 6); Equity: Eligible; Benefits: Comprehensive package

Skills & Requirements

Must-have

  • HPC/large-scale cluster administration
  • Linux systems engineering
  • infrastructure automation
  • provisioning bare-metal clusters
  • Slurm and Kubernetes deployment
  • Python and Bash scripting
  • cluster telemetry and dashboard tools

Nice-to-have

  • customer-first attitude
  • proactive approach to leadership
  • experience with LLMs
  • Professional Services background
  • customer-facing deployment experience

Key Requirements

  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
  • 10+ years of experience in HPC/large-scale cluster administration, Linux systems engineering, infrastructure automation, or data center operations
  • 5+ years of hands-on experience provisioning, managing, and optimizing bare-metal clusters using NVIDIA Base Command Manager (BCM) or similar technology
  • Expert knowledge of Slurm and Kubernetes deployment, management, and usage
  • Proficiency in Python and Bash scripting
  • Hands-on experience with cluster telemetry and dashboard tools
  • Outstanding written and verbal communication skills
  • Customer-first attitude, self-motivation, and proactive leadership

Work Rights

Not specified

Tailored Resume

Cover Letter