Senior Site Reliability Engineer

PandaDoc

Remote
Remote
Incident management processes
Observability stack and alerting
Python (django and asyncio)
Site Reliability Engineers (SREs) are essential to PandaDoc's success, ensuring customers receive a reliable service with minimal downtime

Job Summary

  • Site Reliability Engineers (SREs) are essential to PandaDoc's success, ensuring customers receive a reliable service with minimal downtime.
  • The SRE team achieves this by owning the incident management processes and tools, managing the observability stack and alerting systems, and actively contributing to service codebases.
  • We're known for our work-life balance, kind co-workers, & creative virtual team-bonding events.

Matching Summary

Site Reliability Engineers (SREs) are essential to PandaDoc's success, ensuring customers receive a reliable service with minimal downtime.

Skills & Requirements

Must-have

  • incident management processes
  • observability stack and alerting
  • Python (Django and AsyncIO)
  • Java (Spring Boot)
  • AWS and Kubernetes
  • relational databases (PostgreSQL)
  • messaging systems (e.g. RabbitMQ, NATS, Kafka)

Nice-to-have

  • knowledge sharing on reliability
  • act like an owner
  • hands-on troubleshooting
  • foster SRE principles

Key Requirements

  • Solid programming experience
  • Experience in maintaining an observability tools suite (LGTM)
  • Experience in development and maintenance of Python services
  • Strong experience with AWS and Kubernetes
  • Solid proficiency with relational databases and messaging systems
  • Experienced on-call SRE engineer
  • Proficiency in English

Work Rights

Not specified

Tailored Resume

Cover Letter