Senior Solutions Architect, Cloud Infrastructure And Devops - Nvis
Nvidia Corporation
8+ years networking fundamentals tcp/ip
Kubernetes container orchestration for ai/ml
Hpc cluster deployment and troubleshooting
This role involves maintaining large-scale HPC/AI clusters with comprehensive monitoring, logging, and alerting capabilities
Job Summary
This role involves maintaining large-scale HPC/AI clusters with comprehensive monitoring, logging, and alerting capabilities.
The successful candidate will act as the face to the customer, analyzing and defining large-scale networking projects in collaboration with partners and internal teams.
NVIDIA is seeking an autonomous and creative professional to join a dynamic team building some of the world's largest and fastest AI systems.
Matching Summary
This role involves maintaining large-scale HPC/AI clusters with comprehensive monitoring, logging, and alerting capabilities.
Skills & Requirements
Must-have
8+ years networking fundamentals TCP/IP
Kubernetes container orchestration for AI/ML
HPC cluster deployment and troubleshooting
Slurm Kubernetes Singularity job scheduling
Linux internals Redhat CentOS Ubuntu
Python programming and bash scripting
Jenkins Ansible Puppet Chef automation
Nice-to-have
CPU GPU architecture knowledge
DGX CUDA GPU-focused hardware experience
RDMA InfiniBand RoCE fabric familiarity
Emerging storage technologies awareness
Japanese-speaking customer collaboration
Key Requirements
BS/MS/PhD in Computer Science or related field
Minimum 8 years professional experience in networking