This role involves driving upstream-first engineering for open-source inference engines like vLLM and SGLang to ensure outstanding performance on NVIDIA GPUs
Job Summary
This role involves driving upstream-first engineering for open-source inference engines like vLLM and SGLang to ensure outstanding performance on NVIDIA GPUs.
You will optimize inference runtime features for efficiency, latency, and scalability while collaborating closely with internal teams and the broader community.
The position offers a competitive base salary range, equity, and benefits, and emphasizes NVIDIA's commitment to diversity and equal opportunity employment.
Matching Summary
This role involves driving upstream-first engineering for open-source inference engines like vLLM and SGLang to ensure outstanding performance on NVIDIA GPUs.
Salary
Base: 272,000 USD - 431,250 USD; Bonus/Equity: Eligible for equity; Benefits: Eligible for benefits
Skills & Requirements
Must-have
LLM inference and serving systems
GPU performance engineering
Distributed systems and concurrency
Programming in Rust, C++, Python, CUDA
Profiling and performance optimization
Multi-GPU and multi-node inference
Upstream open-source contribution
Nice-to-have
Mentoring senior engineers
Building benchmarking and regression infrastructure