Senior Dgx Cloud Ai Infrastructure Software Engineer

NVIDIA

Base: 184,000 usd - 356,500 usd; bonus/equity: not...
Not specified (assumed to be flexible or hybrid based on company culture).
Large-scale ai training and inferencing
Software and systems engineering practices
High efficiency and availability of ai systems
NVIDIA is seeking a Senior DGX Cloud AI Infrastructure Software Engineer to join their DGX Cloud AI Efficiency Team, which focuses on enhancing the infrastructure for AI research. The role involves designing and maintaining AI systems for large-scale training and inference, requiring significant experience in software infrastructure and strong debugging skills

Job Summary

  • Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing to the infrastructure that powers our innovative AI research.
  • You'll be instrumental in designing, building, and maintaining AI infrastructure that enable large-scale AI training and inferencing.
  • The role provides the autonomy to work on meaningful projects with the support and mentorship needed to succeed, and contributes to a culture of blameless postmortems, iterative improvement, and risk-taking.

Matching Summary

Match Score: 85

NVIDIA is seeking a Senior DGX Cloud AI Infrastructure Software Engineer to join their DGX Cloud AI Efficiency Team, which focuses on enhancing the infrastructure for AI research. The role involves designing and maintaining AI systems for large-scale training and inference, requiring significant experience in software infrastructure and strong debugging skills.

Salary

Base: 184,000 USD - 356,500 USD; Bonus/Equity: Not specified; Benefits: Not specified

Skills & Requirements

Must-have

  • large-scale AI training and inferencing
  • software and systems engineering practices
  • high efficiency and availability of AI systems
  • observability platforms for monitoring and logging
  • building and scaling large-scale distributed systems
  • AI training and inferencing infrastructure services
  • quality software engineering practices

Nice-to-have

  • working with large scale clusters
  • defining and building observability
  • root cause analysis of failures
  • understanding DL frameworks internal

Key Requirements

  • 8+ years of experience in developing software infrastructure
  • Bachelor's degree or higher in Computer Science or related
  • Strong debugging skills
  • Proficiency in Python, C/C++

Work Rights

Not specified

Tailored Resume

Cover Letter