Principal Deployment Engineer

Nscaleoperationsukltd

United States
On-site
7-8+ years infrastructure engineering experience
Hands-on gpu server deployment (hgx/dgx)
High-speed networking (infiniband, roce, ethernet)
The company is building next-generation AI infrastructure from the ground up to deliver highly performant and scalable GPU clusters

Job Summary

  • The company is building next-generation AI infrastructure from the ground up to deliver highly performant and scalable GPU clusters.
  • This role requires leading the end-to-end bringup of GPU nodes and racks while validating BIOS, firmware, and high-speed network fabrics.
  • Success involves turning ad hoc deployments into repeatable systems that meet strict performance baselines for frontier AI workloads.

Matching Summary

The company is building next-generation AI infrastructure from the ground up to deliver highly performant and scalable GPU clusters.

Skills & Requirements

Must-have

  • 7-8+ years infrastructure engineering experience
  • Hands-on GPU server deployment (HGX/DGX)
  • High-speed networking (InfiniBand, RoCE, Ethernet)
  • Strong Linux systems knowledge
  • Distributed systems performance troubleshooting
  • Onsite data center environment work

Nice-to-have

  • AI/ML or HPC environment experience
  • Familiarity with NCCL, CUDA, RDMA
  • Automation skills (Python, Ansible, Terraform)
  • High-density power and cooling expertise
  • Bias toward action and ownership culture

Key Requirements

  • 7-8+ years in infrastructure engineering or data center operations
  • Experience deploying GPU servers like HGX or DGX platforms
  • Comfortable working onsite in data center environments

Work Rights

Not specified

Tailored Resume

Cover Letter