Our mission is to deliver a highly reliable, scalable, and GPU-enabled platform to support AI/ML workloads, while applying intelligent automation (AIOps) to improve platform operations
Job Summary
Our mission is to deliver a highly reliable, scalable, and GPU-enabled platform to support AI/ML workloads, while applying intelligent automation (AIOps) to improve platform operations.
As part of this team, you will directly manage the Kubernetes control plane, extend platform capabilities via controllers and operators, and implement automation to detect, predict, and self-heal operational issues.
Candidates must have hands-on, on-prem control plane experience and able to work within a hybrid work model on site, as needed.
Matching Summary
Our mission is to deliver a highly reliable, scalable, and GPU-enabled platform to support AI/ML workloads, while applying intelligent automation (AIOps) to improve platform operations.
Salary
Base: $126,500.00 - $252,000.00; Bonus/Equity: Annual bonuses for non-sales roles, performance-based incentive pay for sales roles; Benefits: Medical, dental, vision insurance, 401(k) with match, paid parental leave, disability coverage, life insurance, flexible vacation time off
Skills & Requirements
Must-have
Kubernetes control plane experience
On-premises Kubernetes environments
etcd management
Go programming skills
Kubernetes operators and controllers
Nice-to-have
AI/ML workload enablement
Intelligent automation (AIOps)
Bare-metal infrastructure experience
CNCF or Kubernetes open-source contributions
Key Requirements
5+ years of software engineering experience
3+ years operating Kubernetes in production
Hands-on control plane experience
Experience managing etcd
Strong Go programming skills
Experience building Kubernetes operators/controllers
Deep understanding of Kubernetes internals
Experience debugging large-scale distributed systems
On-prem or self-managed Kubernetes control plane experience required