Help build an Always-On, low-overhead GPU profiling service that runs in production, scales across cluster environments, and delivers actionable insights for ML workloads
Job Summary
Help build an Always-On, low-overhead GPU profiling service that runs in production, scales across cluster environments, and delivers actionable insights for ML workloads.
Lead end-to-end feature delivery spanning user-mode components, driver/platform layers, and performance counter/trace providers to make profiling continuously available and reliable.
Establish profiling models that integrate with existing ML/AI workflows to turn low-level signals into actionable insights.
Matching Summary
Help build an Always-On, low-overhead GPU profiling service that runs in production, scales across cluster environments, and delivers actionable insights for ML workloads.
Skills & Requirements
Must-have
System-level C/C++ development
GPU profiling and tracing stacks
CUDA and GPU architecture expertise
Low-overhead, high-reliability implementations
Performance engineering and memory management
Production-quality software delivery
Nice-to-have
Experience with ML ecosystems like PyTorch and JAX
User-mode driver development
Strong interpersonal and communication skills
Debugging highly concurrent systems
Integration within platform security and permissions
Key Requirements
BS or MS degree or equivalent experience in Computer Engineering or related field
5+ years system-level C/C++ development experience
Familiarity with operating systems fundamentals and computer architectures