AI Inference Engineer

HPC AI TECHNOLOGY PTE. LTD.

Singapore, Singapore
Not specified
C++ and python proficiency
Gpu cuda kernel optimization
Distributed training frameworks
HPC AI Technology Pte. Ltd. is seeking an AI Inference Engineer to build, optimize, and maintain high-performance inference services for large language and multimodal models. The ideal candidate will have a strong background in system software development, particularly in AI systems and high-performance computing, with expertise in GPU optimization and distributed training frameworks

Job Summary

  • The role focuses on building, optimizing, and maintaining high-performance inference services for large language models serving tens of millions of users.
  • Candidates will perform deep GPU/CUDA kernel optimization and develop custom operators using advanced DSLs like Triton and TVM.
  • The position requires expertise in distributed training frameworks such as Megatron-LM and DeepSpeed to solve system challenges across thousands of GPUs.

Matching Summary

Match Score: 85

HPC AI Technology Pte. Ltd. is seeking an AI Inference Engineer to build, optimize, and maintain high-performance inference services for large language and multimodal models. The ideal candidate will have a strong background in system software development, particularly in AI systems and high-performance computing, with expertise in GPU optimization and distributed training frameworks.

Skills & Requirements

Must-have

  • C++ and Python proficiency
  • GPU CUDA kernel optimization
  • Distributed training frameworks
  • Kubernetes resource management
  • LLM inference engine customization

Nice-to-have

  • Publications in top-tier conferences
  • Experience with heterogeneous computing
  • Strong problem-solving skills
  • Deep understanding of AI compilers
  • Excellent communication skills

Key Requirements

  • Bachelor's degree in Computer Science or related field
  • 3+ years of system software development experience
  • 1+ year in AI systems or high-performance computing
  • Hands-on CUDA optimization experience
  • Proficiency in C++ and Python

Work Rights

Not specified

Tailored Resume

Cover Letter