Carfax Careers - Senior Software Engineer - Ml Ops

CARFAX Inc

London, ON, Canada
Competitive compensation; annual bonus program; ge...
On-site
7+ years devops or platform engineering experience
Deep production kubernetes expertise on eks/gke/aks
Cloud infrastructure design on aws, gcp, or azure
This role involves owning critical platform components to drive the reliability, performance, and security of AI infrastructure supporting LLM workloads

Job Summary

  • This role involves owning critical platform components to drive the reliability, performance, and security of AI infrastructure supporting LLM workloads.
  • Candidates will architect scalable Kubernetes solutions with advanced autoscaling strategies tailored for GPU-intensive demands.
  • The position offers a competitive salary, comprehensive benefits, and a flexible four-day summer work week schedule.

Matching Summary

This role involves owning critical platform components to drive the reliability, performance, and security of AI infrastructure supporting LLM workloads.

Salary

Competitive Compensation; Annual bonus program; Generous time-off policies

Skills & Requirements

Must-have

  • 7+ years DevOps or Platform Engineering experience
  • Deep production Kubernetes expertise on EKS/GKE/AKS
  • Cloud infrastructure design on AWS, GCP, or Azure
  • Experience with GPU resource management and scheduling
  • Strong hands-on GitOps and CI/CD pipeline design
  • Production observability platform implementation skills
  • Infrastructure as Code proficiency with Terraform or Helm

Nice-to-have

  • Direct experience with Flyte or Kubeflow workflows
  • Operating JupyterHub or multi-user interactive platforms
  • Familiarity with LLM serving frameworks like vLLM or Triton
  • Experience with FinOps and cost optimization strategies
  • Relevant certifications such as CKA, CKS, or AWS Solutions Architect
  • Background in security vulnerability assessment and remediation

Key Requirements

  • 7+ years of experience in DevOps, Platform Engineering, or MLOps
  • Production experience operating Kubernetes at scale
  • Extensive experience designing workloads on major cloud providers
  • Solid coding ability in Python and/or Go
  • Hands-on experience with vulnerability scanning and cloud security best practices

Work Rights

Not specified

Tailored Resume

Cover Letter