Senior Platform And Engops Engineer - Cluster Operations

NVIDIA

Base: 176,000 usd - 276,000 usd for level 4; 208,0...
Not specified
Automated tools for gpu clusters
Experience with ansible and python
Troubleshooting cluster failures
NVIDIA is seeking a Senior Platform and EngOps Engineer for its Cluster Operations team, focused on enhancing the efficiency of large GPU clusters through automation and modern DevOps practices. The ideal candidate will have extensive experience in managing cluster infrastructure and a strong foundation in automation tools

Job Summary

  • NVIDIA is at the forefront of AI and High-Performance Computing.
  • The role involves managing large GPU clusters and ensuring optimal performance.
  • Candidates will collaborate with dynamic teams across multiple time zones.

Matching Summary

Match Score: 85

NVIDIA is seeking a Senior Platform and EngOps Engineer for its Cluster Operations team, focused on enhancing the efficiency of large GPU clusters through automation and modern DevOps practices. The ideal candidate will have extensive experience in managing cluster infrastructure and a strong foundation in automation tools.

Salary

Base: 176,000 USD - 276,000 USD for Level 4; 208,000 USD - 333,500 USD for Level 5; Bonus/Equity: Not specified; Benefits: Not specified

Skills & Requirements

Must-have

  • Automated tools for GPU clusters
  • Experience with Ansible and Python
  • Troubleshooting cluster failures

Nice-to-have

  • Familiarity with resource scheduling managers
  • Experience with GPU-focused hardware
  • Proficiency in metrics collection infrastructure

Key Requirements

  • BS or MS in Computer Science or related field
  • 8+ years of experience in cluster administration
  • Proficient with Linux fundamentals

Work Rights

Not specified

Tailored Resume

Cover Letter