Provide technical leadership to HPC engineering staff and guide architectural and operational decisions for the HPC environment
Job Summary
Provide technical leadership to HPC engineering staff and guide architectural and operational decisions for the HPC environment.
Design, deploy, and maintain the university’s high-performance computing cluster, including configuring the workload scheduler and architecting quality-of-service policies.
Deploy and support AI workloads, conduct vendor evaluations, and provide advanced technical support to faculty and researchers.
Matching Summary
Provide technical leadership to HPC engineering staff and guide architectural and operational decisions for the HPC environment.
Skills & Requirements
Must-have
Linux systems administration
HPC environment management
RHEL/CentOS Linux
AI workloads deployment
GPU deployment and support
Workload scheduler configuration
Nice-to-have
Vendor evaluation and adoption
Advanced technical support
Research and teaching support
Key Requirements
Bachelor's degree in Computer Science
3 years of Linux systems administration experience
Experience with research computing environments
Experience with RHEL/CentOS Linux operating management
Experience with systems architectures, security, networking, storage systems, parallel computing, batch/scheduling systems
Experience with programming languages (C, C++, bash, Perl)
Experience with source control systems (Git)
Experience with log correlation software (Sumologic)
Experience with large-scale research computing platforms (Globus, HPC environments, SLURM, GPFS)
Experience with Machine Learning Frameworks (Tensorflow)