Site Reliability Engineer - ARK Large Model Platform (Singapore)

BYTEPLUS PTE. LTD.

Singapore, Singapore
**
5 years r&d experience in cloud computing
Proficiency in golang, python, or java
Cloud-native technologies for log collection
** BytePlus Pte. Ltd. is seeking a Site Reliability Engineer for its ARK Large Model Platform in Singapore, focusing on the development and maintenance of large-scale machine learning systems. The ideal candidate should have a strong background in cloud computing, proficiency in programming languages like Golang, Python, or Java, and experience in DevOps practices. **

Job Summary

  • The role involves managing and overseeing the stability of both control and data aspects of large-scale model systems through effective DevOps practices.
  • Candidates will develop and enhance observability systems to ensure high reliability and performance for large model systems.
  • ByteDance offers a positive team atmosphere with career growth opportunities within a flat organization structure.

Matching Summary

Match Score: 75

** BytePlus Pte. Ltd. is seeking a Site Reliability Engineer for its ARK Large Model Platform in Singapore, focusing on the development and maintenance of large-scale machine learning systems. The ideal candidate should have a strong background in cloud computing, proficiency in programming languages like Golang, Python, or Java, and experience in DevOps practices. **

Skills & Requirements

Must-have

  • 5 years R&D experience in cloud computing
  • Proficiency in Golang, Python, or Java
  • Cloud-native technologies for log collection
  • Experience with large-scale cluster management

Nice-to-have

  • Infrastructure as code with Terraform
  • Stability system construction experience
  • Flat organization culture
  • Always Day 1 mindset

Key Requirements

  • B. Sc or higher degree in Computer Science
  • Minimum 5 years of R&D experience
  • Proficiency in cloud-native technology stack

Work Rights

Not specified

Tailored Resume

Cover Letter