Together AI is building the best inference infrastructure for voice applications, powering production-grade real-time voice agents
Job Summary
Together AI is building the best inference infrastructure for voice applications, powering production-grade real-time voice agents.
The role involves optimizing model serving layers for voice workloads using engines like TRT-LLM and SGLang to push latency and throughput to the frontier.
Candidates will collaborate with model partners to integrate state-of-the-art speech models onto Together's high-performance GPU infrastructure.
Matching Summary
Together AI is building the best inference infrastructure for voice applications, powering production-grade real-time voice agents.
Salary
Base: $200,000 - $260,000; Equity: Startup equity included; Benefits: Health insurance and other competitive benefits
Skills & Requirements
Must-have
5+ years ML engineering experience
Hands-on LLM serving engine expertise
Python and PyTorch proficiency
GPU profiling and CUDA optimization
Production ML system deployment
Nice-to-have
Experience with ASR and TTS architectures
Familiarity with audio codecs like SNAC
Knowledge of speech model fine-tuning
Strong product sense for developer needs
Ability to work in fast-paced startup
Key Requirements
Bachelor's or Master's degree in Computer Science or related field