Senior Software Engineer, Ai Resiliency

Invidia

Multiple Locations
Base: 184,000 usd - 287,500 usd; bonus/equity: eli...
C++ and python programming
Distributed systems and fault tolerance
Ai frameworks like pytorch and jax/xla
You will lead the development of AI software resiliency for the most powerful AI supercomputers in the world, ensuring robust and reliable AI systems at scale

Job Summary

  • You will lead the development of AI software resiliency for the most powerful AI supercomputers in the world, ensuring robust and reliable AI systems at scale.
  • The role involves hands-on coding, optimization, fault tolerance, debugging, and collaboration with AI researchers and hardware/software teams to integrate resiliency features.
  • NVIDIA offers a competitive base salary range, equity, benefits, and the opportunity to work on cutting-edge AI infrastructure challenges in a diverse and inclusive environment.

Matching Summary

You will lead the development of AI software resiliency for the most powerful AI supercomputers in the world, ensuring robust and reliable AI systems at scale.

Salary

Base: 184,000 USD - 287,500 USD; Bonus/Equity: Eligible for equity; Benefits: Eligible for benefits

Skills & Requirements

Must-have

  • C++ and Python programming
  • Distributed systems and fault tolerance
  • AI frameworks like PyTorch and JAX/XLA
  • Debugging and profiling tools experience
  • Large-scale AI workload optimization

Nice-to-have

  • CUDA, NCCL, or MPI experience
  • Checkpointing and error mitigation knowledge
  • Experience with HPC or cloud AI workloads
  • Strong systems programming skills
  • Collaborative team environment

Key Requirements

  • Bachelor’s, Master’s or PhD in Computer Science or related field
  • 6+ years of relevant experience
  • Proficiency in C++ and Python
  • Experience with distributed systems and parallel programming
  • Familiarity with AI frameworks
  • Experience with debugging and profiling tools

Work Rights

Not specified

Tailored Resume

Cover Letter