Infiniband networking configuration and management
The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems and GPU server deployments
Job Summary
The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems and GPU server deployments.
This role requires hands-on expertise in InfiniBand networking, cluster bring-up, and hardware installation to ensure reliable, secure, and scalable service delivery.
The engineer will collaborate closely with global teams to troubleshoot operational issues, perform daily health checks, and maintain accurate documentation of system configurations.
Matching Summary
The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems and GPU server deployments.
Skills & Requirements
Must-have
Linux system administration and troubleshooting
GPU server deployment and cluster bring-up
InfiniBand networking configuration and management
Hardware installation and rack stacking
Incident management and on-call support
Nice-to-have
Experience with HPC or AI environments
Large-scale data center buildout experience
Process improvement initiative contribution
Cross-functional global team collaboration
Key Requirements
Bachelor's degree in Computer Science, Engineering, or IT
Strong hands-on experience with Linux command-line tools
Proficiency in Bash scripting and shell automation
Working knowledge of InfiniBand switch configuration
Ability to lift and move equipment weighing 50 pounds