Together AI is building the best inference infrastructure for voice applications to power production-grade, real-time voice agents
Job Summary
Together AI is building the best inference infrastructure for voice applications to power production-grade, real-time voice agents.
The role involves owning the model serving stack for STT, TTS, and speech-to-speech while optimizing latency and throughput on H100s and B200s.
Candidates must drive the technical strategy for next-generation audio-native LLMs and end-to-end speech-to-speech systems before they become mainstream.
Matching Summary
Together AI is building the best inference infrastructure for voice applications to power production-grade, real-time voice agents.
Skills & Requirements
Must-have
8+ years ML engineering experience
TensorRT-LLM or SGLang expertise
Python and PyTorch proficiency
GPU optimization and CUDA kernels
System design at production scale
Real-time voice inference architecture
Nice-to-have
Audio codec tokenization schemes
Speech-to-speech model paradigms
Fine-tuning speech models at scale
Collaboration with model partners
Developer tooling product intuition
Key Requirements
Bachelor's or Master's in Computer Science or related field