Engineering Manager, Kernel Reliability

Cerebras Systems

Sunnyvale, Ca, US
On-site
Kernel reliability
Failure analysis and debugging
Diagnostic tool building
Cerebras Systems is seeking an Engineering Manager for its Kernel Reliability team to enhance the reliability of its advanced AI compute systems. The ideal candidate should have extensive experience in software and hardware reliability, leadership capabilities, and a technical background in debugging and diagnostic tool development

Job Summary

  • Provide hands-on technical leadership, owning the technical vision and roadmap for the kernel-centric reliability of our internal and customer-facing systems.
  • Assist System and Cluster Operations teams on reducing system and service downtime after failure by providing tooling and manual intervention for failure analysis and diagnostic.
  • Lead, mentor, and grow a high-caliber team of engineers, fostering a culture of technical excellence and rapid execution.

Matching Summary

Match Score: 85

Cerebras Systems is seeking an Engineering Manager for its Kernel Reliability team to enhance the reliability of its advanced AI compute systems. The ideal candidate should have extensive experience in software and hardware reliability, leadership capabilities, and a technical background in debugging and diagnostic tool development.

Skills & Requirements

Must-have

  • Kernel reliability
  • Failure analysis and debugging
  • Diagnostic tool building
  • Parallel and distributed programming
  • Computer architectures

Nice-to-have

  • Software or hardware reliability expertise
  • Customer-facing systems reliability
  • Cross-functional collaboration

Key Requirements

  • 6+ years in software engineering
  • 3+ years leading teams
  • Expertise in parallel and distributed programming
  • Experience debugging distributed and parallel applications
  • Deep understanding of computer architectures
  • Strong background in monitoring and reliability engineering

Work Rights

Not specified

Tailored Resume

Cover Letter