Senior Cloud Support Engineer Ii - Ai/ml

DigitalOcean

Hyderabad, India
On-site
Kubernetes (k8s) expertise
Gpu/gradientai infrastructure
Ai/ml pipelines at scale
Serve as the ultimate escalation point for the most complex, business-critical customer issues across Kubernetes, GPU/GradientAI, and AI/ML infrastructure

Job Summary

  • Serve as the ultimate escalation point for the most complex, business-critical customer issues across Kubernetes, GPU/GradientAI, and AI/ML infrastructure.
  • Architect enterprise-grade solutions for customers building large-scale AI/ML workloads on DigitalOcean, including multi-cluster Kubernetes deployments and distributed GPU training infrastructure.
  • Mentor and develop IC1-IC3 engineers through structured coaching, technical reviews, pair troubleshooting sessions, and career development guidance.

Matching Summary

Serve as the ultimate escalation point for the most complex, business-critical customer issues across Kubernetes, GPU/GradientAI, and AI/ML infrastructure.

Skills & Requirements

Must-have

  • Kubernetes (K8S) expertise
  • GPU/GradientAI infrastructure
  • AI/ML pipelines at scale
  • Production ML deployment patterns
  • Cloud architecture design
  • Advanced Linux system administration
  • Python programming skills

Nice-to-have

  • Enterprise-grade solution architecture
  • Customer-first mentality
  • Growth mindset
  • Fast-paced environment
  • Mentoring junior engineers

Key Requirements

  • 7+ years of progressive experience in technical support, solutions engineering, DevOps, or SRE
  • 5+ years in senior technical customer-facing roles
  • Expert-level Kubernetes knowledge
  • Deep GPU/AI/ML infrastructure expertise
  • Advanced understanding of production AI/ML pipelines
  • Extensive experience with major ML frameworks
  • Expertise in GPU optimization techniques
  • Deep knowledge of MLOps practices
  • Experience with large-scale distributed AI/ML workloads
  • Proven experience designing fault-tolerant, scalable cloud architectures
  • Advanced networking expertise
  • Strong programming skills in Python
  • Experience in at least one additional systems language (Go, Rust, C++, or similar)

Work Rights

Not specified

Tailored Resume

Cover Letter