Senior Systems Engineer- Network Infrastructure

Nscale

Us
On-site
5-8+ years infrastructure engineering experience
Hands-on hgx/dgx server deployment
High-speed networking infiniband roce ethernet
The company is building next-generation AI infrastructure from the ground up to deliver highly performant and scalable network clusters for large-scale AI training and inference

Job Summary

  • The company is building next-generation AI infrastructure from the ground up to deliver highly performant and scalable network clusters for large-scale AI training and inference.
  • This role requires leading hands-on bringup of network clusters across data center environments, owning execution from node installation to production readiness.
  • Success involves validating BIOS configurations, tuning fabrics, debugging performance issues, and transforming ad hoc deployments into repeatable, reliable systems.

Matching Summary

The company is building next-generation AI infrastructure from the ground up to deliver highly performant and scalable network clusters for large-scale AI training and inference.

Skills & Requirements

Must-have

  • 5-8+ years infrastructure engineering experience
  • Hands-on HGX/DGX server deployment
  • High-speed networking InfiniBand RoCE Ethernet
  • Strong Linux systems knowledge
  • Distributed systems performance troubleshooting
  • Onsite data center work capability

Nice-to-have

  • AI ML infrastructure or HPC environment experience
  • Familiarity with NCCL CUDA RDMA protocols
  • Automation scripting Python Ansible Terraform Bash
  • High-density power and cooling environment experience

Key Requirements

  • 5-8+ years in infrastructure engineering or data center operations
  • Experience deploying network servers like HGX or DGX platforms
  • Comfortable working onsite in data center environments as needed

Work Rights

Not specified

Tailored Resume

Cover Letter