Senior Manager, Site Reliability Engineering

NVIDIA

Base: 200,000 usd - 322,000 usd; bonus/equity: eli...
5+ years leading global it operations teams
12+ years in site reliability engineering
Proven proficiency in incident and problem management
NVIDIA is seeking a Senior Manager to lead and refine Incident, Problem, and Change Management into an intelligent, automated operating model

Job Summary

  • NVIDIA is seeking a Senior Manager to lead and refine Incident, Problem, and Change Management into an intelligent, automated operating model.
  • The role involves transforming incident response by bringing to bear AI detection, correlation, and guided remediation to reduce resolution times.
  • Candidates will drive the adoption of observability and build automation platforms that reduce manual effort across the outage lifecycle.

Matching Summary

NVIDIA is seeking a Senior Manager to lead and refine Incident, Problem, and Change Management into an intelligent, automated operating model.

Salary

Base: 200,000 USD - 322,000 USD; Bonus/Equity: Eligible for equity; Benefits: Comprehensive benefits package included

Skills & Requirements

Must-have

  • 5+ years leading global IT operations teams
  • 12+ years in Site Reliability Engineering
  • Proven proficiency in Incident and Problem Management
  • Experience applying AI and automation to operations
  • Solid understanding of observability and SLOs

Nice-to-have

  • ITIL knowledge or certification
  • Passion for automation first mentality
  • Experience scaling AI-powered platforms
  • Ability to challenge traditional ITSM models

Key Requirements

  • BS, MS, or PhD in Computer Science or related field
  • 5+ years managing global IT operations
  • 12+ overall years in SRE and IT Service Management
  • Equivalent experience to degree requirements

Work Rights

Not specified

Tailored Resume

Cover Letter