Senior Hardware Support Engineer

Nebius

Birmingham, United States
On-site
Production hardware reliability
Large-scale data center environments
Root cause analysis
Lead root cause analysis for complex hardware and firmware failures across production fleets

Job Summary

  • Lead root cause analysis for complex hardware and firmware failures across production fleets.
  • Coordinate with vendors to drive timely diagnostics, RMAs, firmware fixes, and corrective actions.
  • Improve hardware observability, failure tracking, and reporting processes.

Matching Summary

Lead root cause analysis for complex hardware and firmware failures across production fleets.

Skills & Requirements

Must-have

  • Production hardware reliability
  • Large-scale data center environments
  • Root cause analysis
  • Hardware and firmware failures
  • Vendor coordination
  • Structured problem-solving

Nice-to-have

  • GPU-dense environments
  • AI/high-performance computing
  • Linux-based systems
  • Firmware lifecycle management

Key Requirements

  • Strong hands-on server hardware expertise
  • Proven root cause analysis experience
  • Deep understanding of server components
  • Experience with hardware vendors
  • Structured problem-solving skills
  • Strong analytical capabilities
  • Experience coordinating with on-site teams
  • Ability to manage multiple investigations
  • Clear written and verbal communication

Work Rights

Not specified

Tailored Resume

Cover Letter