Site Reliability Engineer | Ai Infrastructure

Jllcareers

Tel Aviv, Israel
Hybrid
Sre, platform engineering, devops, or infrastructure roles
Cloud platforms (azure or aws)
Containerization (docker, kubernetes)
You will own the platform layer for AI agents, including deployment architecture, observability, and production reliability, in a greenfield environment with real constraints

Job Summary

  • You will own the platform layer for AI agents, including deployment architecture, observability, and production reliability, in a greenfield environment with real constraints.
  • The role involves building monitoring and observability for AI services, tracking output quality, cost per invocation, and model drift, while ensuring sensitive data flows meet audit requirements.
  • This position offers enterprise complexity with startup autonomy, requiring you to write agent code in TypeScript and Python, work with data pipelines, and ship features alongside the team.

Matching Summary

You will own the platform layer for AI agents, including deployment architecture, observability, and production reliability, in a greenfield environment with real constraints.

Skills & Requirements

Must-have

  • SRE, platform engineering, DevOps, or infrastructure roles
  • Cloud platforms (Azure or AWS)
  • Containerization (Docker, Kubernetes)
  • CI/CD pipelines
  • Infrastructure-as-code (Terraform, CDK, CloudFormation)
  • Monitoring and observability tools
  • Linux, networking, security fundamentals
  • Incident management experience

Nice-to-have

  • AI/ML infrastructure experience
  • Production code in TypeScript or Python
  • Self-service developer tooling
  • Cost optimization for cloud workloads
  • Enterprise security engineering

Key Requirements

  • 5+ years in SRE, platform engineering, DevOps, or infrastructure
  • Experience owning infrastructure end-to-end
  • Comfortable working independently with broad ownership
  • Strong written and verbal English

Work Rights

Not specified

Tailored Resume

Cover Letter