Together AI is building the best inference infrastructure for voice applications powering real-time voice agents with best-in-class latency and reliability
Job Summary
Together AI is building the best inference infrastructure for voice applications powering real-time voice agents with best-in-class latency and reliability.
This role involves owning the model serving stack, optimizing GPU utilization, collaborating with model partners, and building evaluation frameworks for voice models.
The company offers competitive compensation including a US base salary range of $200,000 - $260,000 plus equity and benefits, and fosters a small, high-impact team culture.
Matching Summary
Together AI is building the best inference infrastructure for voice applications powering real-time voice agents with best-in-class latency and reliability.
Salary
Base: $200,000 - $260,000; Bonus/Equity: equity; Benefits: health insurance and other competitive benefits
Skills & Requirements
Must-have
Model serving and inference optimization
GPU profiling and CUDA optimization
Python and PyTorch proficiency
Experience with LLM serving engines
Real-time voice AI applications
Nice-to-have
Speech and audio ML knowledge
Experience with audio codecs and tokenization
Fine-tuning speech models
Working in early-stage startup environment
Strong product sense
Key Requirements
5+ years ML engineering experience
Hands-on with LLM serving engines internals
Track record shipping ML systems to production
Bachelor's or Master's degree or equivalent experience