Director, Ai Operations

AstraZeneca

Hybrid
Azure and aws cloud infrastructure
Datadog new relic grafana splunk
Three-tier operations model leadership
This role leads the evolution of a three-tier operations model for the AI/ML platform estate while enforcing operational readiness gates

Job Summary

  • This role leads the evolution of a three-tier operations model for the AI/ML platform estate while enforcing operational readiness gates.
  • The position mandates that every incident results in a runbook, an automation, or a patch to eliminate manual toil.
  • Candidates will partner with product engineering to co-own post-mortems and embed operational requirements into early architecture decisions.

Matching Summary

This role leads the evolution of a three-tier operations model for the AI/ML platform estate while enforcing operational readiness gates.

Skills & Requirements

Must-have

  • Azure and AWS cloud infrastructure
  • Datadog New Relic Grafana Splunk
  • Three-tier operations model leadership
  • SRE team management and development
  • OpenTelemetry distributed tracing logging

Nice-to-have

  • AI/ML operational challenges application
  • LLM inference and model serving experience
  • Pharmaceutical regulated environment familiarity
  • ITIL or SRE certification
  • Conversational runbook design

Key Requirements

  • BSc/MSc/PhD in Computer Science
  • Recent experience building large-scale SRE functions
  • Proven track record in incident management and post-mortems

Work Rights

Not specified

Tailored Resume

Cover Letter