Hpc Cluster Architect

Nexgencloud

UK
Competitive salary; annual discretionary bonus sch...
Remote
Nvidia gpu platform expertise h100 h200 b-series
Infiniband rdma network fabric design
Full lifecycle hpc cluster deployment
Nexgencloud is seeking an HPC Cluster Architect to design and oversee the architecture of large-scale GPU clusters, focusing on performance and optimization. This remote role requires substantial experience in HPC design, particularly with NVIDIA GPU platforms, and offers competitive benefits and a collaborative work culture

Job Summary

  • This role owns the full architecture cycle from customer requirements to production-ready GPU deployments for large-scale dedicated clusters.
  • The successful candidate will act as a technical authority translating complex design trade-offs into clear decisions while engaging directly with OEMs and vendors.
  • NexGen Cloud offers real ownership, autonomy, and the opportunity to shape the team culture in a fast-moving environment serving tens of thousands of customers.

Matching Summary

Match Score: 85

Nexgencloud is seeking an HPC Cluster Architect to design and oversee the architecture of large-scale GPU clusters, focusing on performance and optimization. This remote role requires substantial experience in HPC design, particularly with NVIDIA GPU platforms, and offers competitive benefits and a collaborative work culture.

Salary

Competitive salary; Annual discretionary bonus scheme; 25 days holiday plus public holidays

Skills & Requirements

Must-have

  • NVIDIA GPU platform expertise H100 H200 B-series
  • InfiniBand RDMA network fabric design
  • Full lifecycle HPC cluster deployment
  • Linux systems PCIe topology NUMA alignment
  • Vendor OEM engagement and hardware validation

Nice-to-have

  • Spectrum-X next-generation Ethernet fabrics
  • Large-scale cluster deployments 1000+ GPUs
  • Air-cooled and liquid-cooled HPC environments
  • Infrastructure-as-code automation experience
  • NCCL MLPerf performance benchmarking

Key Requirements

  • Proven experience designing GPU-based HPC or AI clusters at scale
  • Deep hands-on knowledge of NVIDIA reference architectures
  • Background from OEM hyperscaler neo-cloud or enterprise HPC environment
  • Confident technical leadership across Solutions Architecture and Engineering teams

Work Rights

Not specified

Tailored Resume

Cover Letter