Research Scientist – Speech And Audio Understanding (large Models & Multimodal Systems)
Tencent Music Entertainment Group
Bellevue, Washington, US
Base: $122,500.00 to $229,700.00 py; bonus/equity:...
**
Phd in computer science or related field
Speech and audio signal processing expertise
Deep learning frameworks like pytorch or tensorflow
**
Tencent Music Entertainment Group is seeking a Research Scientist specializing in speech and audio understanding within large models and multimodal systems. The role involves developing advanced multimodal models that integrate audio, text, and vision, along with managing high-quality datasets in this domain.
**
Job Summary
The role involves building native multimodal model systems that jointly support vision, audio, and text for comprehensive world perception.
Candidates will contribute to developing general-purpose end-to-end large speech models covering multilingual ASR, translation, and synthesis.
Employees are eligible for a sign-on payment, relocation package, restricted stock units, and up to 15-25 days of vacation per year.
Matching Summary
Match Score: 75
**
Tencent Music Entertainment Group is seeking a Research Scientist specializing in speech and audio understanding within large models and multimodal systems. The role involves developing advanced multimodal models that integrate audio, text, and vision, along with managing high-quality datasets in this domain.
**
Salary
Base: $122,500.00 to $229,700.00 per year; Bonus/Equity: Sign-on payment and restricted stock units available; Benefits: Medical, dental, vision, life, disability, 401(k), and paid leave
Skills & Requirements
Must-have
PhD in Computer Science or related field
Speech and audio signal processing expertise
Deep learning frameworks like PyTorch or TensorFlow