Sr. Site Reliability Engineer

Pinterest

Toronto, Ontario, Canada
On-site
Strong knowledge of kubernetes eks
4+ years programming python or golang
Experience with terraform buildkite argocd
The Site Reliability Engineering organization is accountable for ensuring overall Pinterest availability and enhancing Engineering teams' capability to design robust systems at scale

Job Summary

  • The Site Reliability Engineering organization is accountable for ensuring overall Pinterest availability and enhancing Engineering teams' capability to design robust systems at scale.
  • This role involves tackling project challenges on EKS, such as implementing Karpenter, while building tools and automation to eliminate toil and reduce operational overhead.
  • Candidates are expected to leverage AI for analyzing incidents and generating remediation plans while maintaining high integrity and accountability for final decisions.

Matching Summary

The Site Reliability Engineering organization is accountable for ensuring overall Pinterest availability and enhancing Engineering teams' capability to design robust systems at scale.

Skills & Requirements

Must-have

  • Strong knowledge of Kubernetes EKS
  • 4+ years programming Python or Golang
  • Experience with Terraform Buildkite ArgoCD
  • Hands-on experience with AI-assisted tools
  • Ability to write effective LLM prompts

Nice-to-have

  • Collaboration across various engineering teams
  • Deep understanding of system scaling behaviors
  • High integrity and ownership mindset
  • Critical evaluation of AI-generated code
  • Experience with open-source tools

Key Requirements

  • Bachelor's or Master's degree in Computer Science or equivalent
  • 4+ years of experience with Python or Golang
  • Demonstrated ability to verify AI-assisted work through testing and peer review

Work Rights

Not specified

Tailored Resume

Cover Letter