You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this infrastructure
Job Summary
You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this infrastructure.
With your help we would forge the next generation of compute infrastructure multiplying the power of the CPU, GPU and DPU for the age of AI.
Design and develop AI tools needed for automating maintenance of 35000+ hosts with only 12 support engineers.
Matching Summary
You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this infrastructure.
Skills & Requirements
Must-have
Kubernetes on-premises setup
Container management with Docker and containerd
Programming in Python, Golang or Java
Configuration management with Ansible, Chef or Puppet
Large scale cloud and on-prem infrastructure
CI systems like Jenkins
SQL and NoSQL database experience
Nice-to-have
Experience with DevOps/SRE teams
Strong collaboration skills
Analytics and visualization tools like Kibana and Grafana
Monitoring systems such as Zabbix or Nagios
Ability to design simple reliable systems
Multi-tasking in evolving priorities
AI-powered automation development
Key Requirements
10+ years of proven experience
Bachelor's or Master's degree in CS or related field
Experience maintaining large scale Kubernetes, Slurm and OpenStack
Experience with SQL (MySQL) and NoSQL (Elastic Search/MongoDB)