Senior Manager, Engineering - Data Center Telemetry And Ras

NVIDIA

Base: 272,000 usd - 431,250 usd; bonus/equity: equ...
**
Data center telemetry & ras
Oob telemetry solution and data validation
Platform telemetry, ras and observability
** NVIDIA is seeking a Senior Manager of Engineering to lead its Data Center Telemetry team, focusing on the architecture and deployment of telemetry solutions for AI supercomputing platforms. The ideal candidate will have extensive experience in systems/platform software, strong leadership skills, and a deep understanding of server architecture and telemetry technologies. **

Job Summary

  • Lead the Data Center Telemetry team responsible for driving the architecture, development, and deployment of telemetry solutions at scale for next-generation AI supercomputing platforms.
  • Build and mentor a world-class engineering team focused on platform telemetry, RAS and observability, while continuously improving software development processes.
  • Collaborate across teams to ensure seamless integration of telemetry solutions with platform firmware, server architecture, and data center management, and drive product life cycles with QA teams.

Matching Summary

Match Score: 75

** NVIDIA is seeking a Senior Manager of Engineering to lead its Data Center Telemetry team, focusing on the architecture and deployment of telemetry solutions for AI supercomputing platforms. The ideal candidate will have extensive experience in systems/platform software, strong leadership skills, and a deep understanding of server architecture and telemetry technologies. **

Salary

Base: 272,000 USD - 431,250 USD; Bonus/Equity: Equity; Benefits: Benefits

Skills & Requirements

Must-have

  • Data Center Telemetry & RAS
  • OOB telemetry solution and data validation
  • Platform telemetry, RAS and observability
  • Cross-functional collaboration
  • Server and firmware architecture optimization
  • Scalable server products and telemetry solutions

Nice-to-have

  • Creative solutions to complicated problems
  • Active contributor to Open Compute (OCP)
  • Innovative platform telemetry solutions for AI

Key Requirements

  • 12+ years overall relevant experience
  • 5 years managing systems/platform software teams
  • BS, MS, or PhD in EE/CS or related field
  • Strong knowledge of DMTF/PLDM for OOB telemetry
  • Experience with time series databases and REST APIs
  • Hands-on experience with x86/ARM system architecture and coding (C/C++, Python)
  • Experience with SCM (Git, Perforce) and Jira

Work Rights

Not specified

Tailored Resume

Cover Letter