HPC High Performance Compute System Engineer (Nvidia GPU Cuda)

D L RESOURCES PTE LTD

Singapore
Linux system administration experience
Hpc cluster deployment and monitoring
Job scheduling with slurm pbs lsf
This role focuses on supporting the daily operations, deployment, and maintenance of High Performance Computing clusters and Linux systems

Job Summary

  • This role focuses on supporting the daily operations, deployment, and maintenance of High Performance Computing clusters and Linux systems.
  • Candidates will be responsible for managing job scheduling systems like Slurm, PBS, and LSF while ensuring optimal resource utilization.
  • The position requires strong troubleshooting skills to handle incident management and assist with user onboarding in a data center environment.

Matching Summary

Match Score: 85

This role focuses on supporting the daily operations, deployment, and maintenance of High Performance Computing clusters and Linux systems.

Skills & Requirements

Must-have

  • Linux system administration experience
  • HPC cluster deployment and monitoring
  • Job scheduling with Slurm PBS LSF
  • Server management bare metal virtualization
  • Networking fundamentals TCP/IP DNS
  • Storage technologies NAS SAN parallel file systems
  • Scripting Bash Shell Python

Nice-to-have

  • GPU computing NVIDIA CUDA experience
  • Cloud platforms AWS Azure GCP exposure
  • Monitoring tools Prometheus Grafana Nagios
  • DevOps tools Ansible Terraform basic knowledge
  • Performance tuning and HPC optimization skills

Key Requirements

  • 1-3 years of Linux System Administration experience
  • Hands-on experience with RHEL CentOS Ubuntu
  • Exposure to HPC environments or cluster computing

Work Rights

Not specified

Tailored Resume

Cover Letter