Senior Site Reliability Engineer

PandaDoc

Remote
Remote
Incident management processes and tools
Observability stack and alerting systems
Python (django and asyncio) and/or java (spring boot)
Site Reliability Engineers (SREs) are essential to PandaDoc's success, ensuring customers receive a reliable service with minimal downtime

Job Summary

  • Site Reliability Engineers (SREs) are essential to PandaDoc's success, ensuring customers receive a reliable service with minimal downtime.
  • In this role, you will own and influence the incident management process end-to-end, maintain and evolve on-prem observability stack, and keep production applications running smoothly by participating in the on-call rotation.
  • PandaDoc empowers more than 67,000 growing organizations to thrive by taking the work out of document workflow.

Matching Summary

Site Reliability Engineers (SREs) are essential to PandaDoc's success, ensuring customers receive a reliable service with minimal downtime.

Skills & Requirements

Must-have

  • incident management processes and tools
  • observability stack and alerting systems
  • Python (Django and AsyncIO) and/or Java (Spring Boot)
  • AWS and Kubernetes
  • relational databases (PostgreSQL) and messaging systems

Nice-to-have

  • act like an owner
  • knowledge sharing on reliability
  • creative virtual team-bonding events

Key Requirements

  • Solid programming experience
  • Experience in maintaining an observability tools suite (LGTM)
  • Experience in development and maintenance of Python services
  • Strong experience with AWS and Kubernetes
  • Solid proficiency with relational databases and messaging systems
  • Experienced on-call SRE engineer
  • Hands-on troubleshooting of distributed systems
  • Proficiency in English

Work Rights

Not specified

Tailored Resume

Cover Letter