Distributed Training & Inference Optimization Engineer (llm) - Gpu Optimization Department (gpuod)

Rakuten International

Hybrid
Gpu-accelerated ml frameworks expertise
Experience with distributed training optimizations
Llm inference optimizations knowledge
Rakuten International is seeking a Distributed Training & Inference Optimization Engineer to enhance the performance and efficiency of LLM training and inference workloads on their GPU infrastructure. The role involves optimizing ML frameworks and collaborating with global AI/ML teams to deliver scalable solutions while ensuring cost-efficiency

Job Summary

  • The role focuses on maximizing performance and efficiency of LLM workloads on GPU clusters.
  • You will collaborate with global AI/ML teams to tackle high-impact challenges.
  • This position offers the opportunity to research and implement state-of-the-art GPU optimizations.

Matching Summary

Match Score: 85

Rakuten International is seeking a Distributed Training & Inference Optimization Engineer to enhance the performance and efficiency of LLM training and inference workloads on their GPU infrastructure. The role involves optimizing ML frameworks and collaborating with global AI/ML teams to deliver scalable solutions while ensuring cost-efficiency.

Skills & Requirements

Must-have

  • GPU-accelerated ML frameworks expertise
  • Experience with distributed training optimizations
  • LLM inference optimizations knowledge

Nice-to-have

  • Familiarity with Kubernetes for GPU workloads
  • Contributions to open-source ML frameworks
  • Experience with inference serving frameworks

Key Requirements

  • 3+ years of hands-on experience in GPU optimization
  • Bachelor’s or higher degree in Computer Science or related field

Work Rights

Not specified

Tailored Resume

Cover Letter