Senior Devops Engineer, Aiops

NVIDIA

CA, United States
Base: 148,000 usd - 235,750 usd (level 3), 176,000...
Kubernetes deployments and management
Slos/slis and incident response
Automation-first approach with scripting
Join a team building an AI Data Center AIOps platform that transforms high-volume telemetry into reliable insights and automation for GPU fleets

Job Summary

  • Join a team building an AI Data Center AIOps platform that transforms high-volume telemetry into reliable insights and automation for GPU fleets.
  • You will own platform reliability including SLOs/SLIs, incident response, and collaborate closely with Software and Systems Engineering teams.
  • NVIDIA offers competitive salaries, equity, benefits, and a diverse, supportive environment with opportunities for growth and innovation.

Matching Summary

Join a team building an AI Data Center AIOps platform that transforms high-volume telemetry into reliable insights and automation for GPU fleets.

Salary

Base: 148,000 USD - 235,750 USD (Level 3), 176,000 USD - 276,000 USD (Level 4); Bonus/Equity: Eligible for equity; Benefits: Generous benefits package

Skills & Requirements

Must-have

  • Kubernetes deployments and management
  • SLOs/SLIs and incident response
  • Automation-first approach with scripting
  • Infrastructure-as-code with Terraform and Helm
  • Production distributed systems operation
  • Telemetry-heavy microservices experience

Nice-to-have

  • Strong Linux and networking fundamentals
  • Experience with observability platforms
  • Distributed and streaming systems operations
  • Programming automation tools in Python
  • Experience with large-scale Kubernetes clusters
  • Excellent documentation and communication skills

Key Requirements

  • BS/MS in CS/CE or equivalent experience
  • 5+ years operating production distributed systems
  • Proven ownership of observability/AIOps platform reliability
  • Experience with Kubernetes and containerized microservices
  • Strong scripting skills in Python/Bash
  • Experience with CI/CD and infrastructure-as-code
  • Ability to write clear runbooks and documentation

Work Rights

Not specified

Tailored Resume

Cover Letter