Senior Datacenter Technical Program Manager, At-scale Ai Clusters

NVIDIA

Base: 168,000 usd - 258,750 usd; bonus/equity: eli...
8+ years of overall experience
High-performance computing systems
Gpu clusters deployed in on-premises datacenters
This TPM will play a crucial role throughout the lifecycle of the latest AI systems at scale, from datacenter design to production support

Job Summary

  • This TPM will play a crucial role throughout the lifecycle of the latest AI systems at scale, from datacenter design to production support.
  • The role involves leading the integration of new AI clusters with datacenter facilities that have demanding requirements on power, cooling, and instrumentation.
  • Candidates must collaborate with engineering leaders across multiple hardware and software teams to build AI supercomputers and develop reference architectures.

Matching Summary

This TPM will play a crucial role throughout the lifecycle of the latest AI systems at scale, from datacenter design to production support.

Salary

Base: 168,000 USD - 258,750 USD; Bonus/Equity: Eligible for equity; Benefits: Eligible for benefits

Skills & Requirements

Must-have

  • 8+ years of overall experience
  • High-performance computing systems
  • GPU clusters deployed in on-premises datacenters
  • BS in Applied Science or Engineering

Nice-to-have

  • Understanding of datacenter design and power cooling technologies
  • Expertise in system monitoring using Prometheus Grafana Splunk
  • Experience with engineering or academic research community

Key Requirements

  • BS in Applied Science or Engineering
  • 8+ years of overall experience
  • Experience with high-performance computing systems
  • Experience with GPU clusters in on-premises datacenters

Work Rights

Not specified

Tailored Resume

Cover Letter