Senior Cloud Platform Engineer

SambaNova Systems

San Jose, United States
Competitive compensation; equity included; excelle...
On-site
5-8+ years sre or devops experience
Python, go, or java programming skills
Docker and kubernetes orchestration
This role serves as the guardian of reliability, performance, and scalability for the company's AI inferencing service

Job Summary

  • This role serves as the guardian of reliability, performance, and scalability for the company's AI inferencing service.
  • The team invests heavily in automation and robust testing to prevent incidents and minimize on-call fatigue.
  • Candidates will work with cutting-edge technology on a world-class team building one of the most advanced AI stacks in the industry.

Matching Summary

This role serves as the guardian of reliability, performance, and scalability for the company's AI inferencing service.

Salary

Competitive compensation; Equity included; Excellent benefits

Skills & Requirements

Must-have

  • 5-8+ years SRE or DevOps experience
  • Python, Go, or Java programming skills
  • Docker and Kubernetes orchestration
  • Prometheus, Grafana, or Datadog monitoring
  • Terraform or CloudFormation IaC expertise
  • AWS, GCP, or Azure cloud environments

Nice-to-have

  • Experience with ML/AI inferencing services
  • NVIDIA GPU workload optimization knowledge
  • Familiarity with vLLM, SGLang, or Ray frameworks
  • Hybrid cloud and on-premise infrastructure experience
  • MLOps principles and practices understanding
  • Database tuning and caching system experience

Key Requirements

  • Bachelor's degree in Computer Science or related field
  • 5-8+ years of large-scale customer-facing service experience
  • Strong problem-solving skills for distributed systems

Work Rights

Not specified

Tailored Resume

Cover Letter