This role involves operating large-scale big data platforms across hybrid on-premises and cloud environments to enable reliable analytics
Job Summary
This role involves operating large-scale big data platforms across hybrid on-premises and cloud environments to enable reliable analytics.
You will be responsible for building, running, and optimizing production-grade data pipelines using PySpark and Airflow while ensuring data freshness SLAs.
The position requires strong ownership of infrastructure, automation, and reliability while partnering with security teams to ensure compliance with PDPA and GDPR.
Matching Summary
Match Score: 85
This role involves operating large-scale big data platforms across hybrid on-premises and cloud environments to enable reliable analytics.
Skills & Requirements
Must-have
AWS S3 EMR Redshift RDS management
PySpark and Airflow pipeline operations
On-premises cluster maintenance
Docker and Kubernetes fundamentals
CI/CD automation with GitLab or Jenkins
VPC design and networking security
Nice-to-have
MLOps platform support experience
Root cause analysis skills
Cross-functional team collaboration
Data quality validation expertise
Disaster recovery implementation
Key Requirements
2-5+ years in DevOps or Data Platform Engineering
Degree in Computer Science or equivalent experience
Hands-on experience with Spark and Airflow at operational level