Distinguished Software Architect - Deep Learning And Hpc Communications

Nvidia Corporation

Base: 320,000 usd - 488,750 usd; bonus/equity: eli...
Not specified
Deep learning and hpc communications expertise
Gpu architecture and cuda programming fluency
Parallel programming models mpi and shmem
NVIDIA is seeking a Distinguished Software Architect with expertise in Deep Learning and High-Performance Computing (HPC) communications to enhance their data center platforms. The ideal candidate will have extensive experience in parallel programming, GPU architecture, and high-performance networking

Job Summary

  • This role involves co-designing next-generation data center platforms to support Deep Learning and HPC applications running at scales of tens of thousands of GPUs.
  • The successful candidate will research new communication technologies, propose innovative hardware and software solutions, and drive their adoption across application verticals.
  • NVIDIA offers a competitive base salary ranging from 320,000 USD to 488,750 USD along with equity and comprehensive benefits for this distinguished position.

Matching Summary

Match Score: 85

NVIDIA is seeking a Distinguished Software Architect with expertise in Deep Learning and High-Performance Computing (HPC) communications to enhance their data center platforms. The ideal candidate will have extensive experience in parallel programming, GPU architecture, and high-performance networking.

Salary

Base: 320,000 USD - 488,750 USD; Bonus/Equity: Eligible for equity; Benefits: Comprehensive benefits package included

Skills & Requirements

Must-have

  • Deep Learning and HPC communications expertise
  • GPU architecture and CUDA programming fluency
  • Parallel programming models MPI and SHMEM
  • Communication runtime experience NCCL UCX NVSHMEM
  • High performance networking technologies Infiniband Ethernet
  • C or C++ systems software development proficiency

Nice-to-have

  • History of patents publications and conference talks
  • Influential role in industry standards like MPI
  • Collaboration with diverse internal and external teams
  • Experience developing applications using PyTorch TensorFlow
  • Strong background in fault tolerance and resiliency

Key Requirements

  • PhD in Computer Science or equivalent 15+ years experience
  • Expertise in computer and system architecture
  • Deep understanding of network topologies and performance analysis

Work Rights

Not specified

Tailored Resume

Cover Letter