The role involves leading the end-to-end design and maintenance of a highly available, low-latency GPU-based model serving system supporting millions of QPS
Job Summary
The role involves leading the end-to-end design and maintenance of a highly available, low-latency GPU-based model serving system supporting millions of QPS.
Candidates will develop high-performance feature hydration and processing systems including routing, caching, and batching within cloud-based Kubernetes environments.
Reddit offers comprehensive benefits including healthcare, 401k matching, flexible vacation, and generous paid parental leave.
Matching Summary
The role involves leading the end-to-end design and maintenance of a highly available, low-latency GPU-based model serving system supporting millions of QPS.
Salary
Base: $253,300 - $354,600 USD; Equity: Eligible for restricted stock units; Benefits: Comprehensive healthcare, 401k match, paid time off
Skills & Requirements
Must-have
7+ years ML Engineering experience
Kubernetes orchestration at scale
Cloud AI deployment (AWS/GCP)
Python and Go programming proficiency
GPU model serving and optimization
Real-time ML observability
Nice-to-have
Experience with LLM serving online
Multi-cluster compute environment knowledge
E2E inference performance benchmarking
Strong communication with non-technical stakeholders
Deep intuition for genAI product lifecycle
Key Requirements
7+ years of experience in ML Engineering or AI Platform roles
Proficiency in Python, Go, PyTorch, Triton, vLLM, or Dynamo
Experience operating Kubernetes orchestration systems at scale