Senior Ai And Hpc Observability Engineer

Invidia

Multiple Locations
Base: 152,000 usd - 241,500 usd level 3; 184,000 u...
Distributed systems fundamentals
Production-grade software development
Observability architectures metrics logs traces
NVIDIA is a pioneer in accelerated computing, driving breakthroughs in gaming, computer graphics, high-performance computing, and artificial intelligence

Job Summary

  • NVIDIA is a pioneer in accelerated computing, driving breakthroughs in gaming, computer graphics, high-performance computing, and artificial intelligence.
  • You will design and develop high-throughput, reliable telemetry pipelines and modern data infrastructure to power some of the world’s most advanced computing workloads.
  • The role offers a competitive base salary range, equity, and benefits, with a commitment to diversity and equal opportunity employment.

Matching Summary

NVIDIA is a pioneer in accelerated computing, driving breakthroughs in gaming, computer graphics, high-performance computing, and artificial intelligence.

Salary

Base: 152,000 USD - 241,500 USD Level 3; 184,000 USD - 287,500 USD Level 4; Bonus/Equity: Eligible for equity; Benefits: Eligible for benefits

Skills & Requirements

Must-have

  • distributed systems fundamentals
  • production-grade software development
  • observability architectures metrics logs traces
  • PromQL and time-series data systems
  • distributed data pipelines Kafka Spark Flink
  • Kubernetes and cloud-native infrastructure

Nice-to-have

  • OpenTelemetry collectors and instrumentation
  • real-time and batch telemetry pipelines
  • performance tuning and capacity planning
  • AI GPU HPC observability platform design
  • statistical machine learning for anomaly detection
  • integration with AI/ML pipelines and GPU monitoring

Key Requirements

  • Bachelor’s degree or equivalent experience
  • 5+ years backend or distributed systems experience
  • Strong programming skills in Python Go or Java
  • Experience with PromQL and time-series data
  • Experience with Kafka Spark or Flink
  • Experience with Kubernetes and cloud-native infrastructure

Work Rights

Not specified

Tailored Resume

Cover Letter