Senior System Architect, Infrastructure Reliability

Nvidia Corporation

Base: 184,000 usd - 287,500 usd; bonus/equity: not...
Hybrid
Automated root cause analysis pipelines
Expert knowledge of cpu architecture
Strong c++ and python programming skills
Nvidia Corporation is seeking a Senior System Architect to develop automated frameworks for failure attribution in heterogeneous EDA systems, focusing on performance optimization and root cause analysis. The ideal candidate will have extensive experience in distributed systems, CPU architecture, and programming, particularly with C++ and Python

Job Summary

  • NVIDIA is seeking a Senior System Architect to develop an automated framework for failure attribution at scale in accelerated computing.
  • The role involves architecting frameworks that capture high-fidelity state across CPU, GPU, and Fabric at the moment of failure.
  • Candidates will work closely with hardware and infrastructure teams to define signals of impending failure for proactive measures.

Matching Summary

Match Score: 85

Nvidia Corporation is seeking a Senior System Architect to develop automated frameworks for failure attribution in heterogeneous EDA systems, focusing on performance optimization and root cause analysis. The ideal candidate will have extensive experience in distributed systems, CPU architecture, and programming, particularly with C++ and Python.

Salary

Base: 184,000 USD - 287,500 USD; Bonus/Equity: Not specified; Benefits: Not specified

Skills & Requirements

Must-have

  • Automated root cause analysis pipelines
  • Expert knowledge of CPU architecture
  • Strong C++ and Python programming skills
  • Experience with cluster resource managers

Nice-to-have

  • Expert knowledge of Linux kernel
  • Experience with NVIDIA DCGM and NVML
  • Familiarity with checkpoint/restore technologies

Key Requirements

  • 6+ years in systems programming
  • BS, MS, or PhD in Computer Science or Electrical Engineering

Work Rights

Not specified

Tailored Resume

Cover Letter