The Site Reliability Engineering organization is accountable for ensuring overall Pinterest availability and enhancing Engineering teams' capability to design robust systems at scale
Job Summary
The Site Reliability Engineering organization is accountable for ensuring overall Pinterest availability and enhancing Engineering teams' capability to design robust systems at scale.
This role involves tackling project challenges on EKS, such as implementing Karpenter, while building tools and automation to eliminate toil and reduce operational overhead.
Candidates are expected to leverage AI for analyzing incidents and generating remediation plans while maintaining high integrity and accountability for final decisions.
Matching Summary
The Site Reliability Engineering organization is accountable for ensuring overall Pinterest availability and enhancing Engineering teams' capability to design robust systems at scale.
Skills & Requirements
Must-have
Strong knowledge of Kubernetes EKS
4+ years programming Python or Golang
Experience with Terraform Buildkite ArgoCD
Hands-on experience with AI-assisted tools
Ability to write effective LLM prompts
Nice-to-have
Collaboration across various engineering teams
Deep understanding of system scaling behaviors
High integrity and ownership mindset
Critical evaluation of AI-generated code
Experience with open-source tools
Key Requirements
Bachelor's or Master's degree in Computer Science or equivalent
4+ years of experience with Python or Golang
Demonstrated ability to verify AI-assisted work through testing and peer review