Sr. Ai Inference Systems Engineer

Tencent Music Entertainment Group

Palo Alto, California, US
Base: $120,100.00 to $225,700.00 py; bonus/equity:...
Not specified
Master's or ph.d. in computer science
Proficient in ai accelerator architectures
Deep understanding of cuda and triton
Tencent Music Entertainment Group is seeking a Senior AI Inference Systems Engineer in Palo Alto, California, to lead the optimization of inference pipelines for large models and engage in innovative research in heterogeneous computing. The ideal candidate will possess a strong technical background in AI inference optimization, hardware architecture, and distributed systems, complemented by advanced degrees and significant experience in the field

Job Summary

  • The role involves leading the optimization of the full inference pipeline for Large Models such as LLMs and Multimodal systems.
  • Candidates must possess deep expertise in heterogeneous computing and hardware-specific tuning for real-time and batch inference scenarios.
  • Employees are eligible for a sign-on payment, relocation package, restricted stock units, and comprehensive medical and retirement benefits.

Matching Summary

Match Score: 85

Tencent Music Entertainment Group is seeking a Senior AI Inference Systems Engineer in Palo Alto, California, to lead the optimization of inference pipelines for large models and engage in innovative research in heterogeneous computing. The ideal candidate will possess a strong technical background in AI inference optimization, hardware architecture, and distributed systems, complemented by advanced degrees and significant experience in the field.

Salary

Base: $120,100.00 to $225,700.00 per year; Bonus/Equity: Sign-on payment and restricted stock units available; Benefits: Medical, dental, vision, life, disability, 401(k), vacation, holidays, sick leave

Skills & Requirements

Must-have

  • Master's or Ph.D. in Computer Science
  • Proficient in AI accelerator architectures
  • Deep understanding of CUDA and Triton
  • Expertise in multi-level KV Cache management
  • Experience with PyTorch and TensorFlow frameworks

Nice-to-have

  • High-level publications or core patents
  • Experience tuning ultra-large-scale clusters
  • Strong analytical and cross-team collaboration skills

Key Requirements

  • Master's or Ph.D. degree required
  • Significant professional experience in AI inference optimization
  • Mastery of quantization and intelligent routing techniques

Work Rights

Not specified

Tailored Resume

Cover Letter