Staff Engineer, Distributed Storage,hpc & Ai Infrastructure

Togetherai

Amsterdam, Netherlands
Hybrid (2 days onsite)
Multi-petabyte ai/ml storage systems
Kubernetes-native storage operators
High-performance parallel filesystems
Together AI is seeking a Staff Engineer for Distributed Storage to design and deliver advanced storage systems tailored for AI workloads. The role emphasizes expertise in high-performance storage solutions, Kubernetes, and cost optimization within a multi-petabyte environment

Job Summary

  • Design and deliver multi-petabyte storage systems purpose-built for the world’s largest AI training and inference workloads.
  • Build Kubernetes-native storage operators and self-service platforms that provide automated provisioning, strict multi-tenancy, performance isolation, and quota enforcement at cluster scale.
  • Implement monitoring, alerting, SLOs; design DR/backups with runbooks; run chaos engineering; ensure 99.9%+ uptime via proactive/automated remediation.

Matching Summary

Match Score: 85

Together AI is seeking a Staff Engineer for Distributed Storage to design and deliver advanced storage systems tailored for AI workloads. The role emphasizes expertise in high-performance storage solutions, Kubernetes, and cost optimization within a multi-petabyte environment.

Skills & Requirements

Must-have

  • Multi-petabyte AI/ML storage systems
  • Kubernetes-native storage operators
  • High-performance parallel filesystems
  • Cost optimization for storage systems
  • 10-50 GB/s per node data paths
  • RDMA, InfiniBand, 400GbE networks

Nice-to-have

  • GPU Direct Storage (GDS)
  • ML/AI storage patterns
  • Kubernetes operator development
  • Storage snapshots and cloning
  • Backup and disaster recovery
  • Storage encryption

Key Requirements

  • 8+ years in storage engineering
  • 3+ years managing distributed storage at multi-petabyte scale
  • Proven track record deploying and operating high-performance storage for GPU/HPC clusters
  • Deep Kubernetes and cloud-native storage experience
  • Strong coding skills in Go and Python
  • BS/MS in Computer Science, Engineering, or equivalent practical experience
  • History of technical leadership

Work Rights

Not specified

Tailored Resume

Cover Letter