Base: $92,000 to $135,000; bonus/equity: discretio...
On-site
Python/go/c++ development
Model serving services
Gpu platform
Join the Inference team to ship production features that improve latency, reliability, and cost for model serving on our GPU platform
Job Summary
Join the Inference team to ship production features that improve latency, reliability, and cost for model serving on our GPU platform.
Implement well-scoped features and fixes in Python/Go/C++ for model-serving services (e.g., Triton, vLLM, TensorRT-LLM, Ray Serve).
CoreWeave offers a comprehensive benefits program including 100% paid medical, dental, and vision insurance, a 401(k) with employer match, and flexible PTO.
Matching Summary
Join the Inference team to ship production features that improve latency, reliability, and cost for model serving on our GPU platform.
Salary
Base: $92,000 to $135,000; Bonus/Equity: discretionary bonus, equity awards; Benefits: comprehensive benefits program
Skills & Requirements
Must-have
Python/Go/C++ development
model serving services
GPU platform
containerization and Kubernetes
Linux fundamentals
data structures and algorithms
Nice-to-have
performance experiments
micro-batching, KV cache, streaming
Grafana/Prometheus/OpenTelemetry
entrepreneurial outlook
independent thinking
Key Requirements
BS/MS in CS, EE, or related field, or equivalent practical experience
Git/CI basics
Exposure to containers and Kubernetes
Curiosity about GPU inference concepts
Internship or project deploying microservice or ML inference demo
Coursework/research with PyTorch or TensorFlow
Simple CUDA projects
Work Rights
Must be a US person (citizen, permanent resident, refugee, or asylee) or eligible for export authorization