This role is responsible for designing, integrating, and managing high performance computing systems that encompass both hardware and software components within the organization's network infrastructure
Job Summary
This role is responsible for designing, integrating, and managing high performance computing systems that encompass both hardware and software components within the organization's network infrastructure.
The successful candidate will collaborate with data scientists and ML engineers to deploy scalable machine learning models into production while ensuring security, scalability, and reliability.
Candidates must have a Master's or Bachelor's degree with 8-12 years of hands-on HPC administration experience and expertise in cloud-based infrastructure.
Matching Summary
This role is responsible for designing, integrating, and managing high performance computing systems that encompass both hardware and software components within the organization's network infrastructure.
Skills & Requirements
Must-have
Cloud computing architecture AWS
Containerization Singularity Docker
Infrastructure as code Terraform Ansible
Linux system administration Red Hat Ubuntu
Job scheduling SLURM PBS LSF
Distributed file systems Lustre GPFS Ceph
Python or Bash scripting
Nice-to-have
Kubernetes EKS and service mesh
Machine learning frameworks TensorFlow PyTorch
Multi-cloud environments Azure GCP
AWS Lambda event-driven architectures
Agile development environment experience
Big data technologies Hadoop Spark
Research support in healthcare life sciences
Key Requirements
Master's or Bachelor's degree required
8-12 years of HPC administration experience
Red Hat Certified Engineer or LPIC certification preferred