Responsible for building scalable, secure, and high-performance data pipelines that span Hadoop clusters and AWS cloud services
Job Summary
Responsible for building scalable, secure, and high-performance data pipelines that span Hadoop clusters and AWS cloud services.
Leverages deep knowledge of distributed systems, Spark optimization, cloud automation, and big-data management to support analytics, BI, ML, and AI use cases across the enterprise.
We offer programs and plans for a healthy mind, body, wallet and life because it’s important our benefits care for the whole person.
Matching Summary
Responsible for building scalable, secure, and high-performance data pipelines that span Hadoop clusters and AWS cloud services.
Skills & Requirements
Must-have
Hadoop ecosystem (HDFS, Hive, Spark, Kafka)
AWS data stack (S3, Glue, EMR, Lambda)
PySpark, Spark SQL, Spark optimization
ETL and Data Warehousing concepts
CI/CD, GitHub, Git commands
Data modeling, partitioning strategies
Airflow, Control-M, Step Functions
Nice-to-have
Champion People, Grow with Purpose, Be Alight values
Healthcare domain knowledge
Serverless patterns and containerization
Miro for data flow diagrams
AWS certifications
Key Requirements
5-8 years of experience
Proficiency in Python or Shell scripting
Authorization to work in the Employing Country without sponsorship
Work Rights
Authorization to work in the Employing Country without future sponsorship