Senior Systems Engineer – High-performance Ai And Networking Applications

Nvidia Corporation

Base: 184,000 usd - 356,500 usd; bonus/equity: not...
Not specified (assumed hybrid if not explicitly stated)
Nvlink, nvswitch, and infiniband infrastructure
Ai/hpc job schedulers and orchestrators
Mpi and nccl for ai/hpc workflows
Nvidia Corporation is seeking a Senior Systems Engineer to join their Deep Learning Frameworks Infrastructure team, focusing on high-performance AI and networking applications. The role involves collaboration on performance benchmarks, troubleshooting integration issues, and providing technical mentorship, requiring extensive experience in AI/HPC infrastructure and networking technologies

Job Summary

  • Collaborate with networking teams to plan, implement, and evaluate performance benchmarks on NVLINK, NVSwitch, and InfiniBand powered infrastructures.
  • Act as a primary resource for fixing networking and hardware integration issues, focusing on scalable multi-node systems.
  • Offer technical mentorship and documentation for internal teams and external partners on standard methodologies in HPC networking deployments.

Matching Summary

Match Score: 85

Nvidia Corporation is seeking a Senior Systems Engineer to join their Deep Learning Frameworks Infrastructure team, focusing on high-performance AI and networking applications. The role involves collaboration on performance benchmarks, troubleshooting integration issues, and providing technical mentorship, requiring extensive experience in AI/HPC infrastructure and networking technologies.

Salary

Base: 184,000 USD - 356,500 USD; Bonus/Equity: Not specified; Benefits: Not specified

Skills & Requirements

Must-have

  • NVLINK, NVSwitch, and InfiniBand infrastructure
  • AI/HPC job schedulers and orchestrators
  • MPI and NCCL for AI/HPC workflows
  • High-Speed Networking (InfiniBand, RDMA, RoCE, EFA)
  • Deep Learning frameworks (PyTorch, MegatronLM, vllm/sglang)
  • Multi-node systems performance evaluation

Nice-to-have

  • Datacenter automation
  • Advanced network protocols
  • Distributed storage systems (Lustre, GPFS)
  • Networking and communications libraries (NCCL, NIXL, NVSHMEM, UCX)
  • Cluster management and monitoring tools

Key Requirements

  • 8+ years of proven experience in AI/HPC Infrastructure
  • BS/MS or PhD in Computer Science, Engineering, or related field, or equivalent experience

Work Rights

Not specified

Tailored Resume

Cover Letter