Mistral Cloud - Site Reliability Engineer

Mistral AI

Paris, France
On-site
Scalable, highly available, fault-tolerant infrastructures
Production environment troubleshooting
Monitoring, alerting, incident response
Design, build, and maintain scalable, highly available and fault-tolerant infrastructures

Job Summary

  • Design, build, and maintain scalable, highly available and fault-tolerant infrastructures.
  • Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime.
  • Collaborate with software engineers to develop and implement solutions that enable safe and reproducible model-training experiments.

Matching Summary

Design, build, and maintain scalable, highly available and fault-tolerant infrastructures.

Skills & Requirements

Must-have

  • Scalable, highly available, fault-tolerant infrastructures
  • Production environment troubleshooting
  • Monitoring, alerting, incident response
  • CI/CD, containerization, orchestration
  • Infrastructure-as-code tools
  • Scripting languages (Python, Go, Bash)

Nice-to-have

  • AI/ML environment experience
  • High-performance computing systems
  • Modern AI-oriented solutions
  • Reason with rigor
  • Audacious enough
  • Make our customers succeed

Key Requirements

  • Master’s degree in Computer Science or related field
  • 5+ years of experience in DevOps/SRE role
  • Experience with bare metal infrastructure
  • Experience with reliability KPIs
  • Hands-on experience with Docker and Kubernetes
  • Familiarity with Terraform or CloudFormation
  • Strong understanding of networking and security

Work Rights

Not specified

Tailored Resume

Cover Letter