Senior Site Reliability Engineer- Remote

ClickHouse

Remote
**
Build and lead reliability processes
Ensure reliability, availability, scalability, performance
Design and implement scalable, secure, highly available systems
** ClickHouse, a rapidly growing company recognized in the Forbes Cloud 100, is seeking a Senior Site Reliability Engineer to enhance the reliability and performance of its cloud services. The role involves collaboration across engineering teams to design scalable systems, manage incident response, and improve overall service efficiency in a remote-friendly environment. **

Job Summary

  • You will be responsible for building and leading processes to ensure the reliability, availability, scalability, and performance of our cloud infrastructure.
  • You will collaborate with different teams like Control Plane, Data Plane, Core, Security, Support and Operations and guide them to design and implement scalable, secure, highly available and fault-tolerant distributed systems.
  • ClickHouse provides equal employment opportunities to all employees and applicants and prohibits discrimination and harassment of any type based on factors such as race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Matching Summary

Match Score: 75

** ClickHouse, a rapidly growing company recognized in the Forbes Cloud 100, is seeking a Senior Site Reliability Engineer to enhance the reliability and performance of its cloud services. The role involves collaboration across engineering teams to design scalable systems, manage incident response, and improve overall service efficiency in a remote-friendly environment. **

Skills & Requirements

Must-have

  • build and lead reliability processes
  • ensure reliability, availability, scalability, performance
  • design and implement scalable, secure, highly available systems
  • establish and manage SLOs and SLAs
  • monitoring and alerting for infrastructure components
  • enhance incident response and post-mortem analysis
  • leverage software engineering expertise

Nice-to-have

  • transform how companies use data
  • partner with the business
  • high level of responsibility and ownership
  • thrive in a fast-paced environment
  • passionate about efficiency and scalability

Key Requirements

  • 8 years of experience in SRE
  • Hands-on experience with Go and/or Python
  • Strong knowledge of AWS, Azure, or GCP
  • Experience with Kubernetes or Docker Swarm
  • Experience with Ansible, Terraform, or Puppet
  • Strong production debugging skills

Work Rights

Not specified

Tailored Resume

Cover Letter