HPC AI Technology Pte. Ltd. is seeking an AI Inference Engineer to build, optimize, and maintain high-performance inference services for large language and multimodal models. The ideal candidate will have a strong background in system software development, particularly in AI systems and high-performance computing, with expertise in GPU optimization and distributed training frameworks
Job Summary
The role focuses on building, optimizing, and maintaining high-performance inference services for large language models serving tens of millions of users.
Candidates will perform deep GPU/CUDA kernel optimization and develop custom operators using advanced DSLs like Triton and TVM.
The position requires expertise in distributed training frameworks such as Megatron-LM and DeepSpeed to solve system challenges across thousands of GPUs.
Matching Summary
Match Score: 85
HPC AI Technology Pte. Ltd. is seeking an AI Inference Engineer to build, optimize, and maintain high-performance inference services for large language and multimodal models. The ideal candidate will have a strong background in system software development, particularly in AI systems and high-performance computing, with expertise in GPU optimization and distributed training frameworks.
Skills & Requirements
Must-have
C++ and Python proficiency
GPU CUDA kernel optimization
Distributed training frameworks
Kubernetes resource management
LLM inference engine customization
Nice-to-have
Publications in top-tier conferences
Experience with heterogeneous computing
Strong problem-solving skills
Deep understanding of AI compilers
Excellent communication skills
Key Requirements
Bachelor's degree in Computer Science or related field
3+ years of system software development experience
1+ year in AI systems or high-performance computing