Senior Ai Infrastructure Engineer

Berkeley Research Group (BRG)

Fully remote
6-8 years infrastructure engineering experience
Microsoft azure aks networking expertise
Terraform or opentofu modular design skills
The role involves designing and scaling an Azure-based platform to process over 100,000 documents daily using state-of-the-art LLMs

Job Summary

  • The role involves designing and scaling an Azure-based platform to process over 100,000 documents daily using state-of-the-art LLMs.
  • Candidates will own the modular Terraform codebase and enforce GitOps practices for both infrastructure and application workloads.
  • The position requires ensuring secure handling of sensitive client data while defining SLOs and AI-cost telemetry across the platform.

Matching Summary

The role involves designing and scaling an Azure-based platform to process over 100,000 documents daily using state-of-the-art LLMs.

Skills & Requirements

Must-have

  • 6-8 years infrastructure engineering experience
  • Microsoft Azure AKS networking expertise
  • Terraform or OpenTofu modular design skills
  • GitOps workflows with ArgoCD or Flux
  • Python or Go programming for GPU optimization

Nice-to-have

  • Azure OpenAI and hosted LLM operations at scale
  • Vector databases and RAG infrastructure knowledge
  • Policy-as-code implementation with OPA or Checkov
  • SOC 2 and ISO 27001 compliance experience
  • Experience with Haystack LangChain or LangGraph

Key Requirements

  • 6-8 years in infrastructure or distributed systems
  • Deep hands-on Microsoft Azure experience required
  • Production Kubernetes experience with ArgoCD or Flux

Work Rights

Not specified

Tailored Resume

Cover Letter