Infiniband networking configuration and management
The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems and GPU server deployments
Job Summary
The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems and GPU server deployments.
This role requires hands-on expertise in InfiniBand networking, cluster bring-up, and hardware installation to ensure reliable and scalable service delivery.
Candidates must be willing to work flexible hours including nights, weekends, and on-call rotations while adhering to strict safety and quality standards.
Matching Summary
The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems and GPU server deployments.
Skills & Requirements
Must-have
Linux system administration and troubleshooting
GPU server deployment and cluster bring-up
InfiniBand networking configuration and management
Hardware installation and rack stacking
Incident management and on-call rotation
Nice-to-have
Experience with HPC or AI environments
Familiarity with large-scale data center buildouts
Strong documentation and runbook maintenance skills
Cross-functional collaboration in global teams
Key Requirements
Bachelor's degree in Computer Science, Engineering, or IT
Hands-on experience with Linux command-line tools and Bash scripting
Proven experience setting up and validating GPU servers in clusters