Sr. Ml Platform Engineer (hybrid)

CrowdStrike UK

Market leader in compensation + equity awards; not...
12+ years in distributed systems engineering
5+ years debugging ml platforms in production
Deep expertise in ray, spark, jupyterhub, slurm, or k8s
The role involves diagnosing complex distributed systems issues to maintain CrowdStrike's mission-critical ML infrastructure processing billions of events daily

Job Summary

  • The role involves diagnosing complex distributed systems issues to maintain CrowdStrike's mission-critical ML infrastructure processing billions of events daily.
  • Candidates will partner with ML engineers to resolve workflow issues, conduct post-mortems, and mentor others on debugging techniques.
  • CrowdStrike offers market-leading compensation, comprehensive wellness programs, and a culture that provides flexibility and autonomy to own careers.

Matching Summary

The role involves diagnosing complex distributed systems issues to maintain CrowdStrike's mission-critical ML infrastructure processing billions of events daily.

Salary

Market leader in compensation and equity awards; Not specified; Comprehensive physical and mental wellness programs included

Skills & Requirements

Must-have

  • 12+ years in distributed systems engineering
  • 5+ years debugging ML platforms in production
  • Deep expertise in Ray, Spark, JupyterHub, SLURM, or K8s
  • Performance profiling and optimization skills
  • Expert Python debugging and Linux/Unix proficiency

Nice-to-have

  • Open-source ML infrastructure contributions
  • Experience with high-throughput inference systems
  • Published debugging guides or tools
  • Chaos engineering and GPU/CUDA debugging experience
  • On-call and incident management experience

Key Requirements

  • 12+ years in distributed systems engineering
  • 5+ years debugging ML platforms in production
  • Expertise in at least three of: Ray, Spark, JupyterHub, SLURM, K8s

Work Rights

Not specified

Tailored Resume

Cover Letter