Senior Software Engineer, Ai Resiliency

Nvidia Corporation

Base: 184,000 usd - 287,500 usd; bonus/equity: not...
**
C++ and python programming
Distributed systems concepts
Fault tolerance techniques
** Nvidia is seeking a Senior Software Engineer for its AI Resiliency team to develop software features that enhance the reliability of AI supercomputers. The ideal candidate will have extensive experience in distributed systems, high-performance coding, and a strong familiarity with AI frameworks. This position offers competitive compensation and the opportunity to work in a collaborative, cutting-edge environment. **

Job Summary

  • You will lead the development of AI software resiliency for powerful AI supercomputers.
  • Your expertise will help drive down cluster downtime towards zero.
  • You’ll work alongside world-class engineers solving challenges in AI infrastructure.

Matching Summary

Match Score: 75

** Nvidia is seeking a Senior Software Engineer for its AI Resiliency team to develop software features that enhance the reliability of AI supercomputers. The ideal candidate will have extensive experience in distributed systems, high-performance coding, and a strong familiarity with AI frameworks. This position offers competitive compensation and the opportunity to work in a collaborative, cutting-edge environment. **

Salary

Base: 184,000 USD - 287,500 USD; Bonus/Equity: Not specified; Benefits: Not specified

Skills & Requirements

Must-have

  • C++ and Python programming
  • Distributed systems concepts
  • Fault tolerance techniques

Nice-to-have

  • Experience with CUDA and MPI
  • Knowledge of error mitigation strategies
  • Collaboration with AI researchers

Key Requirements

  • Bachelor’s, Master’s or PhD in relevant field
  • 6+ years of relevant experience
  • Familiarity with AI frameworks like PyTorch

Work Rights

Not specified

Tailored Resume

Cover Letter