Senior Hpc Cluster Engineer

Nebius

Remote
**
Gpu and infiniband network optimization
Kvm/qemu stack enhancement
System-level software development
** Nebius is seeking a Senior HPC Cluster Engineer to enhance their cloud platform, focusing on GPU computing and InfiniBand networks. The role requires extensive experience in system-level software development and performance optimization within high-performance computing environments. **

Job Summary

  • Nebius is leading a new era in cloud computing to serve the global AI economy.
  • The role involves analyzing, troubleshooting, and improving infrastructure to support new hardware, fine-tuning system performance, and automating fault detection and resolution.
  • We offer competitive salary and comprehensive benefits package, opportunities for professional growth, and flexible working arrangements.

Matching Summary

Match Score: 75

** Nebius is seeking a Senior HPC Cluster Engineer to enhance their cloud platform, focusing on GPU computing and InfiniBand networks. The role requires extensive experience in system-level software development and performance optimization within high-performance computing environments. **

Skills & Requirements

Must-have

  • GPU and InfiniBand network optimization
  • KVM/QEMU stack enhancement
  • System-level software development
  • Linux system administration and tuning
  • Server architecture and PCIe devices

Nice-to-have

  • AI/ML workload performance analysis
  • Deep learning framework integration
  • Collective communication libraries
  • Software-Defined Networking (SDN)

Key Requirements

  • 5+ years system-level software development
  • 3+ years Linux systems experience
  • In-depth server architecture understanding
  • Proficiency in C/C++, Go, or Python

Work Rights

Not specified

Tailored Resume

Cover Letter