Forward Deployed Engineer (gpu Clusters)

Togetherai

San Francisco, United States
Base: $270,000 - $300,000; equity: startup equity ...
On-site
Large-scale gpu infrastructure experience
Kubernetes or slurm orchestration mastery
Infiniband and nvlink networking expertise
This role serves as a hands-on technical partner to world-leading AI model builders, ensuring complex Proof of Concept requirements are met

Job Summary

  • This role serves as a hands-on technical partner to world-leading AI model builders, ensuring complex Proof of Concept requirements are met.
  • The engineer will design rigorous test suites and optimize cluster performance by debugging bottlenecks in InfiniBand fabrics and NVLink topologies.
  • Together AI offers competitive compensation ranging from $270,000 to $300,000 plus equity and benefits for this full-time position.

Matching Summary

This role serves as a hands-on technical partner to world-leading AI model builders, ensuring complex Proof of Concept requirements are met.

Salary

Base: $270,000 - $300,000; Equity: Startup equity included; Benefits: Health insurance and remote work flexibility

Skills & Requirements

Must-have

  • Large-Scale GPU Infrastructure experience
  • Kubernetes or SLURM orchestration mastery
  • InfiniBand and NVLink networking expertise
  • Python and shell scripting proficiency
  • Parallel file system knowledge

Nice-to-have

  • Experience with VAST or Weka storage systems
  • Background in high-performance computing benchmarking
  • Ability to influence hardware roadmap via feedback
  • Comfort with fast-paced frontier model lab environment

Key Requirements

  • 5+ years in technical roles focused on GPU infrastructure
  • Deep hands-on experience with Kubernetes GPU-operator
  • Expert knowledge of NCCL collective communication diagnostics

Work Rights

Not specified

Tailored Resume

Cover Letter