Senior Datacenter Resiliency Architect

Invidia

CA, United States
Base: 184,000 usd - 287,500 usd for level 4, 224,0...
Fully remote
Gpu hardware architecture expertise
Ras (reliability, availability, serviceability) features
Architecture model development
NVIDIA is seeking a Resiliency Architect to support the development and validation of GPU hardware and software resiliency features impacting AI and high-performance computing

Job Summary

  • NVIDIA is seeking a Resiliency Architect to support the development and validation of GPU hardware and software resiliency features impacting AI and high-performance computing.
  • The role involves collaborating with architects and engineers to develop and execute architecture verification test plans and improve system RAS metrics.
  • Employees are eligible for competitive base salary, equity, and benefits, and join a team driving innovation in the AI computing industry.

Matching Summary

NVIDIA is seeking a Resiliency Architect to support the development and validation of GPU hardware and software resiliency features impacting AI and high-performance computing.

Salary

Base: 184,000 USD - 287,500 USD for Level 4, 224,000 USD - 356,500 USD for Level 5; Bonus/Equity: Eligible for equity; Benefits: Eligible for benefits

Skills & Requirements

Must-have

  • GPU hardware architecture expertise
  • RAS (Reliability, Availability, Serviceability) features
  • Architecture model development
  • Python scripting and automation
  • C/C++ programming proficiency
  • Debugging and analytical skills

Nice-to-have

  • Verilog/System Verilog RTL simulation
  • CUDA programming
  • Collaboration with remote teams
  • Machine Learning/Deep Learning concepts

Key Requirements

  • Master’s or PhD in Computer or Electrical Engineering or equivalent experience
  • At least 5+ years of relevant experience
  • Work authorization in United States

Work Rights

Not specified

Tailored Resume

Cover Letter