Staff Ai Infrastructure Engineer

Biohub

Redwood City, CA, US
Base: $241,000 - $331,000; bonus/equity: not speci...
Hybrid (60% onsite, approximately 3 days a week)
8+ years ai/ml infrastructure experience
Deep expertise in hpc slurm cluster operations
Strong linux systems fundamentals networking storage
Biohub is seeking a Staff AI Infrastructure Engineer to enhance its AI Compute Platform, focusing on maintaining and optimizing large-scale GPU clusters to support AI biology research. The role combines advanced technical challenges with a mission-driven environment aimed at curing diseases through scientific discovery

Job Summary

  • Biohub is building a general-purpose system integrating frontier AI models and biological foundation models to accelerate scientific discovery and cure disease.
  • The team owns the design, operation, and reliability of large-scale multi-GPU AI clusters that power protein language models and genomic foundation models.
  • New hires receive generous employer match on 401(k) contributions, paid time off to volunteer, funding for family-forming benefits, and relocation support.

Matching Summary

Match Score: 85

Biohub is seeking a Staff AI Infrastructure Engineer to enhance its AI Compute Platform, focusing on maintaining and optimizing large-scale GPU clusters to support AI biology research. The role combines advanced technical challenges with a mission-driven environment aimed at curing diseases through scientific discovery.

Salary

Base: $241,000 - $331,000; Bonus/Equity: Not specified; Benefits: Generous 401(k) match, PTO for volunteering, family-forming funding, relocation support

Skills & Requirements

Must-have

  • 8+ years AI/ML infrastructure experience
  • Deep expertise in HPC Slurm cluster operations
  • Strong Linux systems fundamentals networking storage
  • Hands-on Kubernetes cloud-native infrastructure experience
  • Proficiency in Python and Bash automation

Nice-to-have

  • Experience with Go Rust or C/C++
  • Knowledge of NCCL PyTorch DDP distributed training
  • Familiarity with VAST WEKA storage solutions
  • Experience with Sunk/CoreWeave patterns
  • Ability to debug complex multi-system failures

Key Requirements

  • 8+ years of AI/ML infrastructure engineering experience
  • Strong Linux systems fundamentals including networking and storage
  • Hands-on experience with Kubernetes and GitOps tooling
  • Experience with HPC workload managers like Slurm
  • Proficiency in Python and Bash scripting

Work Rights

Not specified

Tailored Resume

Cover Letter