Senior Hpc And Ai Networking Performance Research And Analysis Engineer

Invidia

CA, United States
Base: 152,000 usd - 218,500 usd for level 3, 184,0...
Hybrid
High-performance networking (rdma, mpi, nccl)
Performance analysis and profiling
Nvidia gpus and cuda library
NVIDIA is seeking a Senior HPC and AI Networking Performance Research and Analysis Engineer to profile and analyze AI workloads on large GPU and CPU scale clusters for distributed deep learning LLM training focusing on collective communication and networking

Job Summary

  • NVIDIA is seeking a Senior HPC and AI Networking Performance Research and Analysis Engineer to profile and analyze AI workloads on large GPU and CPU scale clusters for distributed deep learning LLM training focusing on collective communication and networking.
  • The role involves benchmarking, profiling, and analyzing performance to identify bottlenecks and optimize networking aspects, while collaborating across hardware and software teams to provide performance insights.
  • NVIDIA offers a diverse and supportive environment with highly competitive salaries, equity, and comprehensive benefits, fostering innovation in AI and accelerated computing.

Matching Summary

NVIDIA is seeking a Senior HPC and AI Networking Performance Research and Analysis Engineer to profile and analyze AI workloads on large GPU and CPU scale clusters for distributed deep learning LLM training focusing on collective communication and networking.

Salary

Base: 152,000 USD - 218,500 USD for Level 3, 184,000 USD - 287,500 USD for Level 4; Bonus/Equity: Eligible for equity; Benefits: Comprehensive benefits package

Skills & Requirements

Must-have

  • High-performance networking (RDMA, MPI, NCCL)
  • Performance analysis and profiling
  • NVIDIA GPUs and CUDA library
  • Deep learning frameworks TensorFlow or PyTorch
  • Programming in Python, Bash, and C
  • Linux OS experience

Nice-to-have

  • Strong analytical and problem-solving skills
  • Good communication and interpersonal skills
  • Fast and self-learning capabilities
  • In-depth system knowledge (Intel/AMD/ARM CPUs, NVIDIA GPUs, HCA, Memory, PCI)
  • Experience with congestion control algorithms
  • Collaborative teamwork

Key Requirements

  • B.Sc in Computer Science or Software Engineering or equivalent experience
  • 5+ years experience in high-performance networking
  • Experience with NVIDIA GPUs and CUDA
  • Experience with deep learning frameworks
  • Experience with networking collective communication libraries and protocols
  • Programming skills in Python, Bash, and C
  • Experience with Linux OS

Work Rights

Not specified

Tailored Resume

Cover Letter