Senior Hpc Cluster Engineer

Nebius

Remote
Competitive compensation; not specified; benefits ...
Remote
System-level software development experience
Linux systems administration and tuning
Server architecture and pcie device knowledge
Nebius is building a full-stack AI cloud platform to support developers and enterprises from data training to production deployment

Job Summary

  • Nebius is building a full-stack AI cloud platform to support developers and enterprises from data training to production deployment.
  • The role focuses on enhancing core components of the hyperscaler platform with specific attention to GPU computing and InfiniBand networks.
  • Candidates will benefit from competitive compensation, career growth opportunities, and the chance to work on impactful AI projects in an international environment.

Matching Summary

Nebius is building a full-stack AI cloud platform to support developers and enterprises from data training to production deployment.

Salary

Competitive compensation; Not specified; Benefits include flexibility and career growth

Skills & Requirements

Must-have

  • System-level software development experience
  • Linux systems administration and tuning
  • Server architecture and PCIe device knowledge
  • Performance-oriented programming in C/C++ or Go

Nice-to-have

  • GPU end-to-end testing in cluster environments
  • HPC workload optimization experience
  • RDMA, RoCE, and InfiniBand protocol familiarity
  • Software-Defined Networking background
  • QEMU/KVM virtualization management
  • Deep learning framework integration skills
  • MPI and NCCL collective communication libraries

Key Requirements

  • 5+ years professional experience in system-level software development
  • 3+ years hands-on experience with Linux systems
  • Strong proficiency in performance-oriented programming languages

Work Rights

Must be authorized to work in the country of application

Tailored Resume

Cover Letter