Principal Site Reliability Engineer

Jobgether

Canada, Canada
On-site
Cloud infrastructure
Aws/eks environments
Incident response
Own and enhance the reliability, scalability, and security of a complex cloud infrastructure supporting mission-critical workloads

Job Summary

  • Own and enhance the reliability, scalability, and security of a complex cloud infrastructure supporting mission-critical workloads.
  • Work hands-on across multi-region AWS/EKS environments, partnering with engineering leads, ML and simulation teams, and customer-facing teams to drive operational excellence.
  • Lead incident response, implement automated remediation, and guide cloud architecture decisions while optimizing performance, security, and cost.

Matching Summary

Own and enhance the reliability, scalability, and security of a complex cloud infrastructure supporting mission-critical workloads.

Skills & Requirements

Must-have

  • cloud infrastructure
  • AWS/EKS environments
  • incident response
  • automated remediation
  • cloud architecture decisions
  • performance optimization
  • security optimization

Nice-to-have

  • fast-paced environment
  • high-autonomy environment
  • shaping infrastructure strategies
  • customer success impact

Key Requirements

  • Deep technical expertise
  • Strong problem-solving skills
  • End-to-end ownership of large-scale infrastructure projects

Work Rights

Not specified

Tailored Resume

Cover Letter