Manager, Site Reliability Engineering

NVIDIA

Base: 208,000 usd - 333,500 usd; bonus/equity: eli...
Hybrid
Python and scripting programming
Large scale cloud infrastructure
Service level agreement maintenance
You will be leading the team of site reliability engineers responsible for automating maintenance of 10000+ hosts and providing support to customers towards debugging workflows

Job Summary

  • You will be leading the team of site reliability engineers responsible for automating maintenance of 10000+ hosts and providing support to customers towards debugging workflows.
  • Reuse AI techniques and data analytics to extract useful signals about machines and jobs to ensure high availability and resiliency of the systems in the data center.
  • NVIDIA is widely considered to be one of the technology world’s most desirable employers with some of the most brilliant and talented people in the world working for us.

Matching Summary

You will be leading the team of site reliability engineers responsible for automating maintenance of 10000+ hosts and providing support to customers towards debugging workflows.

Salary

Base: 208,000 USD - 333,500 USD; Bonus/Equity: Eligible for equity; Benefits: Eligible for benefits

Skills & Requirements

Must-have

  • Python and scripting programming
  • Large scale cloud infrastructure
  • Service level agreement maintenance
  • Debugging and problem solving
  • Agile process and methodologies
  • People management experience

Nice-to-have

  • Data center management experience
  • Computer algorithms expertise
  • Cross-time zone collaboration
  • Continuous improvement focus
  • Designing simple reliable systems

Key Requirements

  • 8+ years industry experience
  • 2+ years people management experience
  • BS/MS in Computer Science or equivalent experience

Work Rights

Not specified

Tailored Resume

Cover Letter