The role involves spearheading high-impact initiatives to design multi-cloud setups that maximize GPU availability for deep learning models
Job Summary
The role involves spearheading high-impact initiatives to design multi-cloud setups that maximize GPU availability for deep learning models.
Candidates will partner with ML researchers to resolve system-level bottlenecks in distributed training jobs to maximize GPU utilization.
The team is driven by a mission to build a frictionless development environment that empowers engineers to rapidly innovate on autonomous driving technology.
Matching Summary
The role involves spearheading high-impact initiatives to design multi-cloud setups that maximize GPU availability for deep learning models.
Skills & Requirements
Must-have
Multi-cloud architecture design
Distributed training optimization
Kubernetes container orchestration
Python or Go programming
AWS cloud services experience
Nice-to-have
ML model profiling expertise
High-performance GPU management
PyTorch or Ray framework knowledge
Agentic AI system development
LLM-driven developer tools
Key Requirements
5+ years of professional software engineering experience