Site Reliability Engineer, Applied Machine Learning Engineering

MANPOWER STAFFING SERVICES (SINGAPORE) PTE LTD

Singapore, Singapore
Not specified (assumed hybrid or onsite based on the nature of the role)
Bachelor's degree in computer science
3+ years distributed systems experience
Python, go, or shell scripting skills
This position seeks a Site Reliability Engineer specializing in Applied Machine Learning Engineering at Manpower Staffing Services in Singapore. The role emphasizes ownership of system reliability, incident management, and the automation of operational tasks in machine learning systems

Job Summary

  • Serve as the primary responder for the Applied Machine Learning Engine, ensuring high reliability through immediate incident response.
  • Manage the end-to-end feedback loop for incidents including rapid triage, resolution, and post-incident reviews to prevent recurrence.
  • Collaborate with software and hardware engineers to integrate systems, deploy solutions, and optimize Standard Operating Procedures using automation.

Matching Summary

Match Score: 85

This position seeks a Site Reliability Engineer specializing in Applied Machine Learning Engineering at Manpower Staffing Services in Singapore. The role emphasizes ownership of system reliability, incident management, and the automation of operational tasks in machine learning systems.

Skills & Requirements

Must-have

  • Bachelor's degree in Computer Science
  • 3+ years distributed systems experience
  • Python, Go, or Shell scripting skills
  • Large-scale system design experience
  • Incident lifecycle management

Nice-to-have

  • SLI/SLO and error budget management
  • Chaos Engineering experience
  • MLOps platforms like Kubeflow or MLflow
  • Linux internals and container technologies
  • Machine learning frameworks familiarity

Key Requirements

  • Bachelor's degree in Computer Science
  • 3+ years relevant experience
  • Scripting proficiency in Python, Go, or Bash

Work Rights

Not specified

Tailored Resume

Cover Letter