Senior Hpc Cluster Administrator - Deep Learning Frameworks Infrastructure

Topjobstoday

Base: 221,250 pln - 383,500 pln for level 3; bonus...
Deep expertise in linux systems administration
Experience deploying large-scale hpc clusters
Strong scripting skills in python or bash
You will lead the design, deployment, and reliability of large-scale GPU compute clusters

Job Summary

  • You will lead the design, deployment, and reliability of large-scale GPU compute clusters.
  • The role involves collaborating with software, research, and product teams to enhance infrastructure.
  • Join a team committed to a collaborative and inclusive environment.

Matching Summary

You will lead the design, deployment, and reliability of large-scale GPU compute clusters.

Salary

Base: 221,250 PLN - 383,500 PLN for Level 3; Bonus/Equity: Not specified; Benefits: Not specified

Skills & Requirements

Must-have

  • Deep expertise in Linux systems administration
  • Experience deploying large-scale HPC clusters
  • Strong scripting skills in Python or bash
  • Hands-on experience with Slurm job scheduling
  • Proficiency with Ansible for configuration management

Nice-to-have

  • Experience with NVIDIA GPU infrastructure tools
  • Familiarity with cluster management platforms
  • Background in MLOps tooling or ML platform engineering

Key Requirements

  • BS/MS in CS, EE, CE or equivalent experience
  • 5+ years of experience in HPC or ML training clusters
  • Experience with distributed filesystems and storage architecture

Work Rights

Not specified

Tailored Resume

Cover Letter