Senior Systems Software Engineer, Data Center Infrastructure Management - Engops

Nvidia Corporation

Multiple Locations
Base: 152,000 usd - 241,500 usd (level 3); 184,000...
**
Maintain rack-scale management solutions
Troubleshoot cluster failures
Manage software and firmware updates
** NVIDIA is seeking a Senior Systems Software Engineer for its Data Center Infrastructure Management team. The role requires over five years of experience in managing high-performance clusters and involves troubleshooting, deployment, and updates of hardware and software solutions. **

Job Summary

  • Join our team of innovative engineers who develop and maintain software facilitating GPU communication, driving groundbreaking solutions in High Performance Computing and Deep Learning.
  • In this role, you will be responsible for maintaining high-performance, rack-scale management solutions for datacenter environments.
  • You will work directly with our Infrastructure Service software development team to support deployment and debug of our hardware and Infrastructure Manager.

Matching Summary

Match Score: 75

** NVIDIA is seeking a Senior Systems Software Engineer for its Data Center Infrastructure Management team. The role requires over five years of experience in managing high-performance clusters and involves troubleshooting, deployment, and updates of hardware and software solutions. **

Salary

Base: 152,000 USD - 241,500 USD (Level 3); 184,000 USD - 287,500 USD (Level 4); Bonus/Equity: Equity eligible; Benefits: Benefits eligible

Skills & Requirements

Must-have

  • Maintain rack-scale management solutions
  • Troubleshoot cluster failures
  • Manage software and firmware updates
  • Deploy services in Kubernetes
  • Understand server, rack, and network topologies
  • Hardware/firmware/software interactions
  • Automate recovery actions

Nice-to-have

  • Industry standard alerting tools
  • Emergency response practices
  • Observability tools such as Grafana
  • GPU-focused hardware and software
  • OpenStack and Foreman

Key Requirements

  • 5+ years of experience
  • BS or MS in Computer Science or related field
  • Deploying and administrating clusters, servers, switches
  • Deploying services in Kubernetes
  • Datacenter or computer architecture experience
  • Hardware management protocols (Redfish, IPMI, BMC)
  • Firmware update automation
  • Configuring and debugging complex data center networks

Work Rights

Not specified

Tailored Resume

Cover Letter