This role serves as the backbone of the AI platform by building high-performance systems that enable researchers to develop perception and planning models
Job Summary
This role serves as the backbone of the AI platform by building high-performance systems that enable researchers to develop perception and planning models.
The engineer will architect mission-critical Kubernetes clusters optimized for heavy GPU/TPU workloads while implementing self-healing infrastructure using autonomous AI agents.
Candidates must be willing to work onsite 5 days a week at the Mountain View, CA office to support the company's autonomous middle-mile logistics operations.
Matching Summary
This role serves as the backbone of the AI platform by building high-performance systems that enable researchers to develop perception and planning models.
Skills & Requirements
Must-have
Kubernetes cluster management for GPU workloads
Apache Airflow and Kafka pipeline development
Terraform Infrastructure as Code implementation
ArgoCD GitOps workflow automation
NCCL networking optimization for distributed training
Nice-to-have
Experience with LangGraph and CrewAI agents
Familiarity with Triton Inference Server and Ray Serve
Knowledge of 3D Gaussian Splatting techniques
Background in PyTorch Distributed and DeepSpeed
Key Requirements
5+ years experience in Cloud Infrastructure or MLOps
Deep expertise in Kubernetes, Helm, and container orchestration
Strong background in Apache Airflow, Argo Workflows, MLFlow, and Terraform