Solutions Architect - Devops

Nvidia Corporation

Sydney, Australia
Remote
Kubernetes container platforms
Ai/ml workloads
Hpc cluster management
Maintain large scale computational and AI infrastructure, focusing on monitoring, logging, and workload orchestration

Job Summary

  • Maintain large scale computational and AI infrastructure, focusing on monitoring, logging, and workload orchestration.
  • Develop tooling to automate deployment and management of large-scale infrastructure environments.
  • Become the technical leader for assigned customer accounts, providing strategic guidance on DevOps and platform architecture.

Matching Summary

Maintain large scale computational and AI infrastructure, focusing on monitoring, logging, and workload orchestration.

Skills & Requirements

Must-have

  • Kubernetes container platforms
  • AI/ML workloads
  • HPC cluster management
  • Linux OS and security
  • Python and Bash scripting
  • Infrastructure-as-Code tools
  • Observability stacks (Grafana, Prometheus)

Nice-to-have

  • CI/CD pipelines
  • GPU hardware and software
  • RDMA-based fabrics

Key Requirements

  • 5+ years of professional experience
  • BS/MS/PhD in relevant fields
  • Experience managing scalable cloud environments
  • Experience in automation engineering roles

Work Rights

Not specified

Tailored Resume

Cover Letter