Sr. Ml Platform Engineer (hybrid)

Falcom

Bangalore, India
Distributed systems engineering
Debugging ml platforms in production
Expertise in ray, spark, jupyterhub, slurm, or kubernetes
CrowdStrike is a global leader in cybersecurity protecting modern organizations with an advanced AI-native platform processing trillions of events daily

Job Summary

  • CrowdStrike is a global leader in cybersecurity protecting modern organizations with an advanced AI-native platform processing trillions of events daily.
  • The role involves diagnosing complex distributed systems issues and ensuring platform reliability for ML infrastructure processing billions of events daily.
  • CrowdStrike offers market-leading compensation, comprehensive wellness programs, professional development opportunities, and a vibrant office culture with world-class amenities.

Matching Summary

CrowdStrike is a global leader in cybersecurity protecting modern organizations with an advanced AI-native platform processing trillions of events daily.

Skills & Requirements

Must-have

  • Distributed systems engineering
  • Debugging ML platforms in production
  • Expertise in Ray, Spark, JupyterHub, SLURM, or Kubernetes
  • Performance profiling and optimization
  • Python debugging and multi-language proficiency
  • Cloud infrastructure experience AWS/GCP/Azure/OCI

Nice-to-have

  • Open-source ML infrastructure contributions
  • Experience with high-throughput inference systems
  • Published debugging guides or tools
  • Chaos engineering and GPU/CUDA debugging
  • On-call and incident management experience
  • Collaborative and mentoring skills

Key Requirements

  • 12+ years in distributed systems engineering
  • 5+ years debugging ML platforms in production
  • Expertise in at least three of Ray, Spark, JupyterHub, SLURM, Kubernetes

Work Rights

Not specified

Tailored Resume

Cover Letter