Senior Software Engineer - Ai Inference

NVIDIA

Base: $152,000 - $241,500 (level 3) or $184,000 - ...
5+ years production software experience
Python c++ and cuda programming skills
Llm inference stack expertise vllm sglang
This role involves contributing directly to upstream open-source inference engines like vLLM and SGLang to ensure best-in-class performance on NVIDIA GPUs

Job Summary

  • This role involves contributing directly to upstream open-source inference engines like vLLM and SGLang to ensure best-in-class performance on NVIDIA GPUs.
  • Engineers will optimize critical runtime capabilities including batching, scheduling policies, and KV-cache efficiency to improve throughput and reduce latency.
  • The position offers competitive compensation ranging from $152,000 to $287,500 USD depending on level, along with equity and comprehensive benefits.

Matching Summary

This role involves contributing directly to upstream open-source inference engines like vLLM and SGLang to ensure best-in-class performance on NVIDIA GPUs.

Salary

Base: $152,000 - $241,500 (Level 3) or $184,000 - $287,500 (Level 4); Bonus/Equity: Eligible for equity; Benefits: Comprehensive benefits package included

Skills & Requirements

Must-have

  • 5+ years production software experience
  • Python C++ and CUDA programming skills
  • LLM inference stack expertise vLLM SGLang
  • GPU profiling and performance optimization
  • Distributed systems and concurrency knowledge

Nice-to-have

  • Open source contributions to vLLM or PyTorch
  • Experience with speculative decoding techniques
  • Background in kernel fusion and memory bandwidth
  • Familiarity with InfiniBand network fabrics
  • Reproducible benchmarking infrastructure building

Key Requirements

  • BS/MS in Computer Science or equivalent experience
  • 5+ years of systems engineering fundamentals
  • Strong track record of performance improvements

Work Rights

Not specified

Tailored Resume

Cover Letter