This role focuses on building tools and improving system design to ensure platforms behave predictably under load and failure
Job Summary
This role focuses on building tools and improving system design to ensure platforms behave predictably under load and failure.
Candidates will define service level objectives that reflect real user impact and continuously assess reliability risks across infrastructure.
The position offers the opportunity to work on systems processing billions of messages daily with a direct impact on the global financial industry.
Matching Summary
This role focuses on building tools and improving system design to ensure platforms behave predictably under load and failure.
Salary
Base: $160,000 - $240,000 USD Annual; Bonus/Equity: Incentive compensation (exempt roles only); Benefits: Comprehensive plan including medical, dental, vision, 401(k) match, and paid time off
Skills & Requirements
Must-have
4+ years software engineering experience
Proficiency in Python programming
Experience with distributed systems
Understanding of system reliability and observability
Familiarity with SLOs, SLIs, and SLAs
Nice-to-have
Experience with Grafana or Humio monitoring tools
Familiarity with Kafka and Java technologies
Knowledge of chaos engineering and resilience testing
Experience with Apache Spark or Amazon S3
Contributions to open source projects
Key Requirements
4+ years of software engineering experience
Degree in Computer Science, Engineering, or equivalent practical experience