Infrastructure as code with terraform or cloudformation
Site reliability engineering principles implementation
The role involves administering and engineering Big Data platforms across multiple Hadoop clusters in the cloud to ensure scalability and reliability
Job Summary
The role involves administering and engineering Big Data platforms across multiple Hadoop clusters in the cloud to ensure scalability and reliability.
Candidates must apply Site Reliability Engineering principles to design robust tooling and automated response mechanisms for proactive risk mitigation.
The position requires providing technical leadership and mentorship while driving root cause analysis for complex platform issues.
Matching Summary
The role involves administering and engineering Big Data platforms across multiple Hadoop clusters in the cloud to ensure scalability and reliability.
Skills & Requirements
Must-have
Hadoop cluster administration on AWS EMR
Infrastructure as Code with Terraform or CloudFormation
Site Reliability Engineering principles implementation
Python, SQL, Spark, and Linux proficiency
Incident triaging and service restoration ownership
Nice-to-have
Experience with Splunk, Dynatrace, or CloudWatch
Knowledge of machine learning algorithms
Mentorship of junior and mid-level engineers
Fostering self-service data capabilities
Strong communication and training delivery skills
Key Requirements
Bachelor's degree in Computer Science or related field
7+ years of experience in platform administration and big data
3+ years of hands-on AWS Infrastructure as Code engineering
3+ years of architectural guidance experience
Experience with Change Management and Incident Management processes