Staff Site Reliability Engineer, Streaming

Alpaca

Remote
Remote
Rabbitmq and redpanda observability
Kubernetes production experience
Go programming proficiency
As a Site Reliability Engineer (SRE) at Alpaca, you will be responsible for ensuring the reliability, scalability, and performance of our systems and services

Job Summary

  • As a Site Reliability Engineer (SRE) at Alpaca, you will be responsible for ensuring the reliability, scalability, and performance of our systems and services.
  • Enhance our RabbitMQ and Redpanda observability stack by defining Service Level Objectives (SLOs) and alerts, as well as implementing profiling and logging.
  • Competitive Salary & Stock Options, Health Benefits, New Hire Home-Office Setup, and a Monthly Stipend are provided.

Matching Summary

As a Site Reliability Engineer (SRE) at Alpaca, you will be responsible for ensuring the reliability, scalability, and performance of our systems and services.

Skills & Requirements

Must-have

  • RabbitMQ and Redpanda observability
  • Kubernetes production experience
  • Go programming proficiency
  • Prometheus monitoring tool
  • Linux operating system
  • message broker performance troubleshooting

Nice-to-have

  • trading/fintech domain knowledge
  • low-latency systems experience
  • Loki and Tempo usage
  • distributed tracing experience
  • USE method experience
  • perf, bpf, pprof experience

Key Requirements

  • 5+ years SRE/Performance Engineering experience
  • 5+ years message broker experience
  • Experience with SLIs, SLOs, SLAs
  • Significant production Kubernetes experience
  • Proficient with Go
  • Proficient with Prometheus
  • Proficient with Linux

Work Rights

Not specified

Tailored Resume

Cover Letter