Senior Resiliency And Safety Architect, Gpu Workloads And Failure Analysis

Invidia

Us, CA, United States
Base: 184,000 usd - 287,500 usd for level 4, 224,0...
Fully remote
Gpu diagnostics development
Hardware failure analysis
Cuda software diagnostics
You will be a key member of a team of innovators, challenging the status quo and pushing beyond boundaries

Job Summary

  • You will be a key member of a team of innovators, challenging the status quo and pushing beyond boundaries.
  • You will have the opportunity to impact the industry's leading GPUs and SoCs powering product lines ranging from AI to self-driving cars and robots.
  • Come, join our Resiliency and Safety Architecture team and help build the real-time, cost-effective computing platforms driving our success in these exciting and rapidly growing fields.

Matching Summary

You will be a key member of a team of innovators, challenging the status quo and pushing beyond boundaries.

Salary

Base: 184,000 USD - 287,500 USD for Level 4, 224,000 USD - 356,500 USD for Level 5; Bonus/Equity: Eligible for equity; Benefits: Eligible for benefits

Skills & Requirements

Must-have

  • GPU diagnostics development
  • Hardware failure analysis
  • CUDA software diagnostics
  • Python scripting and automation
  • C/C++ programming
  • Debugging and analytical skills

Nice-to-have

  • Understanding of GPU hardware architecture
  • Knowledge of AI workload execution on GPUs
  • Datacenter resiliency experience
  • Functional safety familiarity
  • Collaboration with remote teams
  • Machine Learning/Deep Learning concepts

Key Requirements

  • Master’s or PhD in Computer or Electrical Engineering
  • 6+ years of relevant experience
  • Experience characterizing real world GPU applications
  • Experience with concurrency and kernel launches
  • Strong interpersonal and collaboration skills

Work Rights

Not specified

Tailored Resume

Cover Letter