Site Reliability Engineer, ARK Large Model Platform (Singapore)

BYTEPLUS PTE. LTD.

Singapore, Singapore
Not specified; not specified; not specified
**
Cloud native technologies proficiency
Golang python or java programming
Log collection monitoring alerting
** BytePlus PTE. LTD. is seeking a Site Reliability Engineer for their ARK Large Model Platform in Singapore. The role focuses on developing and maintaining large-scale model systems, emphasizing stability and observability, while contributing to innovative machine learning solutions. **

Job Summary

  • The role involves managing and overseeing the stability of control and data aspects of large-scale model systems through effective DevOps practices.
  • Candidates will develop and enhance observability systems to ensure high reliability and performance for large model systems.
  • ByteDance fosters an inclusive space where employees are valued for their skills and unique perspectives while striving to inspire creativity.

Matching Summary

Match Score: 75

** BytePlus PTE. LTD. is seeking a Site Reliability Engineer for their ARK Large Model Platform in Singapore. The role focuses on developing and maintaining large-scale model systems, emphasizing stability and observability, while contributing to innovative machine learning solutions. **

Salary

Not specified; Not specified; Not specified

Skills & Requirements

Must-have

  • Cloud native technologies proficiency
  • Golang Python or Java programming
  • Log collection monitoring alerting
  • Large scale cluster management
  • DevOps practices implementation

Nice-to-have

  • Infrastructure as code Terraform
  • Stability system construction experience
  • Large scale system maintenance
  • Cost reduction strategies
  • Inclusive diverse team environment

Key Requirements

  • B. Sc or higher in Computer Science
  • R&D experience in cloud computing
  • Proficiency in Golang Python or Java

Work Rights

Not specified

Tailored Resume

Cover Letter