Platform Support Engineer (apac)

Lightning AI

Philippines
Not specified; not specified; comprehensive medica...
Remote
Kubernetes and containerized environments
Linux systems knowledge and networking
Distributed ml systems and pytorch
Lightning AI is seeking a Platform Support Engineer to act as a technical partner for ML teams running large-scale training and inference workloads

Job Summary

  • Lightning AI is seeking a Platform Support Engineer to act as a technical partner for ML teams running large-scale training and inference workloads.
  • The role involves diagnosing complex failures in distributed systems, Kubernetes scheduling, and GPU orchestration while translating infrastructure issues into actionable guidance.
  • Employees benefit from a comprehensive package including medical coverage, paid time off, professional development support, and a flexible remote work environment.

Matching Summary

Lightning AI is seeking a Platform Support Engineer to act as a technical partner for ML teams running large-scale training and inference workloads.

Salary

Not specified; Not specified; Comprehensive medical, dental, vision, PTO, parental leave, and stipends

Skills & Requirements

Must-have

  • Kubernetes and containerized environments
  • Linux systems knowledge and networking
  • Distributed ML systems and PyTorch
  • GPU orchestration and CUDA debugging
  • Observability tools like Prometheus

Nice-to-have

  • Large scale model training experience
  • High-performance networking with InfiniBand
  • Bare metal infrastructure operations
  • Python automation and tooling scripts
  • Ray or Kubeflow familiarity

Key Requirements

  • Strong software engineering and systems troubleshooting background
  • Experience operating machine learning workloads in production
  • Based in the Philippines or Singapore
  • Availability for Thursday–Sunday schedule

Work Rights

Must be based in the Philippines or Singapore

Tailored Resume

Cover Letter