Serve as the ultimate escalation point for the most complex, business-critical customer issues across Kubernetes, GPU/GradientAI, and AI/ML infrastructure
Job Summary
Serve as the ultimate escalation point for the most complex, business-critical customer issues across Kubernetes, GPU/GradientAI, and AI/ML infrastructure.
Architect enterprise-grade solutions for customers building large-scale AI/ML workloads on DigitalOcean, including multi-cluster Kubernetes deployments and distributed GPU training infrastructure.
Mentor and develop IC1-IC3 engineers through structured coaching, technical reviews, pair troubleshooting sessions, and career development guidance.
Matching Summary
Serve as the ultimate escalation point for the most complex, business-critical customer issues across Kubernetes, GPU/GradientAI, and AI/ML infrastructure.
Skills & Requirements
Must-have
Kubernetes (K8S) expertise
GPU/GradientAI infrastructure
AI/ML pipelines at scale
Production ML deployment patterns
Cloud architecture design
Advanced Linux system administration
Python programming skills
Nice-to-have
Enterprise-grade solution architecture
Customer-first mentality
Growth mindset
Fast-paced environment
Mentoring junior engineers
Key Requirements
7+ years of progressive experience in technical support, solutions engineering, DevOps, or SRE
5+ years in senior technical customer-facing roles
Expert-level Kubernetes knowledge
Deep GPU/AI/ML infrastructure expertise
Advanced understanding of production AI/ML pipelines
Extensive experience with major ML frameworks
Expertise in GPU optimization techniques
Deep knowledge of MLOps practices
Experience with large-scale distributed AI/ML workloads