Senior Software Engineer, Cloud-native Stack – Csp Engagements

Nvidia Corporation

Base: $184,000 - $356,500 usd depending on level; ...
Kubernetes internals expertise
Slurm federation and plugins
Multi-rack cluster debugging
The role involves defining customer workflows and prototyping stack enhancements for advanced multi-rack AI datacenters using NVIDIA GB200 and upcoming GB300 GPUs

Job Summary

  • The role involves defining customer workflows and prototyping stack enhancements for advanced multi-rack AI datacenters using NVIDIA GB200 and upcoming GB300 GPUs.
  • Engineers will perform deep-dive debugging of complex scheduling challenges across racks, tenants, and clouds while collaborating on architecture reviews with CSP teams.
  • Candidates must possess strong source-level expertise in Kubernetes internals and Slurm alongside proven experience integrating next-generation accelerators into containerized clusters.

Matching Summary

The role involves defining customer workflows and prototyping stack enhancements for advanced multi-rack AI datacenters using NVIDIA GB200 and upcoming GB300 GPUs.

Salary

Base: $184,000 - $356,500 USD depending on Level; Equity: Eligible; Benefits: Eligible

Skills & Requirements

Must-have

  • Kubernetes internals expertise
  • Slurm federation and plugins
  • Multi-rack cluster debugging
  • RDMA/RoCE networking knowledge
  • Go Rust C/C++ Python experience
  • Customer-facing engineering skills

Nice-to-have

  • Upstream contributions to Kubernetes
  • CUDA and deep learning workloads
  • Experience with Blackwell GPUs
  • Strong communication abilities
  • Helm Ansible Terraform proficiency

Key Requirements

  • 6+ years professional software development experience
  • BS or MS in Computer Engineering or Science
  • Distributed systems background required

Work Rights

Not specified

Tailored Resume

Cover Letter