Data Center Engineer, Hpc And Ai

NVIDIA

Lab manager experience with complex clusters
Linux troubleshooting and os support skills
Rack stacking and cable management expertise
The role involves planning and building complex supercomputers and HPC clusters for groundbreaking AI and GPU computing technologies

Job Summary

  • The role involves planning and building complex supercomputers and HPC clusters for groundbreaking AI and GPU computing technologies.
  • Candidates will be responsible for rack stacking, cable management, and ensuring power and cooling efficiency within data centers and labs.
  • The position requires deep hands-on Linux troubleshooting skills alongside support for cloud, VM, storage, and network infrastructure.

Matching Summary

The role involves planning and building complex supercomputers and HPC clusters for groundbreaking AI and GPU computing technologies.

Skills & Requirements

Must-have

  • Lab manager experience with complex clusters
  • Linux troubleshooting and OS support skills
  • Rack stacking and cable management expertise
  • Knowledge of DHCP, DNS, NIS, and AD services
  • Experience supporting large data center operations

Nice-to-have

  • Scripting experience in Bash or Python
  • Configuration management tools like Ansible
  • CI/CD and job scheduler knowledge such as SLURM
  • Virtualization experience with KVM or VMware
  • L2 and L3 network protocol proficiency

Key Requirements

  • MCSE or MCITP/CCNA certification required
  • 3+ years of experience as a lab manager
  • Proven hands-on Linux troubleshooting experience

Work Rights

Not specified

Tailored Resume

Cover Letter