Sde Iv - Gpu Engineer

Glance

Bangalore, India
On-site
High-performance inference runtimes
Cuda / triton / c++ expertise
Distributed inference systems
Architect high-performance inference runtimes, kernel dispatchers, and memory planners for large diffusion and transformer workloads

Job Summary

  • Architect high-performance inference runtimes, kernel dispatchers, and memory planners for large diffusion and transformer workloads.
  • Lead investigations into cross-GPU performance bottlenecks, communication overheads, and scheduling inefficiencies.
  • Establish company-wide GPU optimization standards, tooling, and SLIs.

Matching Summary

Architect high-performance inference runtimes, kernel dispatchers, and memory planners for large diffusion and transformer workloads.

Skills & Requirements

Must-have

  • High-performance inference runtimes
  • CUDA / Triton / C++ expertise
  • Distributed inference systems
  • NCCL, NVLink, PCIe knowledge
  • GPU scheduling and occupancy

Nice-to-have

  • Compiler-aided optimization experience
  • Stable Diffusion tuning
  • Heterogeneous compute backends
  • Hardware-software co-design

Key Requirements

  • 5+ years in HPC, GPU runtime systems, or ML infrastructure
  • Proven expertise in CUDA / Triton / C++
  • Experience building distributed inference systems
  • Ability to design abstractions
  • Strong knowledge of NCCL, NVLink, PCIe
  • Familiar with profiling automation

Work Rights

Not specified

Tailored Resume

Cover Letter