Senior Resiliency And Safety Architect, Gpu Workloads And Failure Analysis
Invidia
Us, CA, United States
Base: 184,000 usd - 287,500 usd for level 4, 224,0...
Fully remote
Gpu diagnostics development
Hardware failure analysis
Cuda software diagnostics
You will be a key member of a team of innovators, challenging the status quo and pushing beyond boundaries
Job Summary
You will be a key member of a team of innovators, challenging the status quo and pushing beyond boundaries.
You will have the opportunity to impact the industry's leading GPUs and SoCs powering product lines ranging from AI to self-driving cars and robots.
Come, join our Resiliency and Safety Architecture team and help build the real-time, cost-effective computing platforms driving our success in these exciting and rapidly growing fields.
Matching Summary
You will be a key member of a team of innovators, challenging the status quo and pushing beyond boundaries.
Salary
Base: 184,000 USD - 287,500 USD for Level 4, 224,000 USD - 356,500 USD for Level 5; Bonus/Equity: Eligible for equity; Benefits: Eligible for benefits
Skills & Requirements
Must-have
GPU diagnostics development
Hardware failure analysis
CUDA software diagnostics
Python scripting and automation
C/C++ programming
Debugging and analytical skills
Nice-to-have
Understanding of GPU hardware architecture
Knowledge of AI workload execution on GPUs
Datacenter resiliency experience
Functional safety familiarity
Collaboration with remote teams
Machine Learning/Deep Learning concepts
Key Requirements
Master’s or PhD in Computer or Electrical Engineering
6+ years of relevant experience
Experience characterizing real world GPU applications