Data Center Operations Engineer

Cadence

Fully remote
Linux system administration and troubleshooting
Gpu server deployment and cluster bring-up
Infiniband networking configuration and management
The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems and GPU server deployments

Job Summary

  • The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems and GPU server deployments.
  • This role requires hands-on expertise in InfiniBand networking, cluster bring-up, and hardware installation to ensure reliable and scalable service delivery.
  • Candidates must be willing to work flexible hours including nights, weekends, and on-call rotations while adhering to strict safety and quality standards.

Matching Summary

The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems and GPU server deployments.

Skills & Requirements

Must-have

  • Linux system administration and troubleshooting
  • GPU server deployment and cluster bring-up
  • InfiniBand networking configuration and management
  • Hardware installation and rack stacking
  • Incident management and on-call rotation

Nice-to-have

  • Experience with HPC or AI environments
  • Familiarity with large-scale data center buildouts
  • Strong documentation and runbook maintenance skills
  • Cross-functional collaboration in global teams

Key Requirements

  • Bachelor's degree in Computer Science, Engineering, or IT
  • Hands-on experience with Linux command-line tools and Bash scripting
  • Proven experience setting up and validating GPU servers in clusters

Work Rights

Not specified

Tailored Resume

Cover Letter