Senior Systems Software Engineer, Data Center Infrastructure Management - Engops

NVIDIA

Base: $152,000 - $241,500 (level 3) or $184,000 - ...
**
5+ years deploying and administering clusters
Experience with kubernetes deployment
Knowledge of redfish ipmi bmc protocols
** NVIDIA is seeking a Senior Systems Software Engineer for its Data Center Infrastructure Management team, requiring over five years of relevant experience in managing high-performance computing environments. The role involves troubleshooting cluster issues, managing updates, and collaborating with software development teams to ensure optimal performance. **

Job Summary

  • NVIDIA is seeking a highly motivated EngOps Engineer to maintain high-performance, rack-scale management solutions for datacenter environments.
  • The role involves taking ownership of daily cluster failures, troubleshooting them promptly, and managing the rollout of software and firmware updates.
  • Candidates will work directly with Infrastructure Service software development teams to support deployment and debug of hardware and Infrastructure Manager.

Matching Summary

Match Score: 75

** NVIDIA is seeking a Senior Systems Software Engineer for its Data Center Infrastructure Management team, requiring over five years of relevant experience in managing high-performance computing environments. The role involves troubleshooting cluster issues, managing updates, and collaborating with software development teams to ensure optimal performance. **

Salary

Base: $152,000 - $241,500 (Level 3) or $184,000 - $287,500 (Level 4); Bonus/Equity: Eligible for equity; Benefits: Comprehensive benefits package included

Skills & Requirements

Must-have

  • 5+ years deploying and administering clusters
  • Experience with Kubernetes deployment
  • Knowledge of Redfish IPMI BMC protocols
  • Understanding of server rack network topologies
  • Scripting for automated recovery actions

Nice-to-have

  • Direct experience with DGX systems and Compute Clusters
  • Proficiency in OpenStack and Foreman
  • Hands-on experience with Grafana observability tools
  • Background in GPU-focused hardware and software

Key Requirements

  • BS or MS in Computer Science or related field
  • 5+ years hands-on experience in cluster administration
  • Datacenter or computer architecture experience required

Work Rights

Not specified

Tailored Resume

Cover Letter