Develop and maintain advanced data pipelines and solutions on the Cloudera/Hadoop stack, ensuring seamless integration and automation across environments through CI/CD pipelines
Job Summary
Develop and maintain advanced data pipelines and solutions on the Cloudera/Hadoop stack, ensuring seamless integration and automation across environments through CI/CD pipelines.
Troubleshoot and resolve performance issues within the Cloudera ecosystem and contribute to continuous improvement of coding standards, development practices, and deployment automation.
Our work model prioritizes in-person collaboration while offering flexibility to support wellbeing, productivity, individual work styles, and life circumstances.
Matching Summary
Develop and maintain advanced data pipelines and solutions on the Cloudera/Hadoop stack, ensuring seamless integration and automation across environments through CI/CD pipelines.
Skills & Requirements
Must-have
Cloudera/Hadoop stack development
Spark (Scala or PySpark) proficiency
Cloudera Data Platform (CDP) experience
GitHub and CI/CD tools
Linux environments and scripting
Scalable, distributed data processing systems
Nice-to-have
Cloud integration experience
Containerization technologies knowledge
Data governance and security frameworks
Data quality, lineage, metadata management
Cloudera or Spark certifications
Key Requirements
Strong experience in Spark (Scala or PySpark) and Hadoop ecosystem
Solid hands-on experience with Cloudera Data Platform (CDP)
Proficiency with GitHub and CI/CD tools
Familiarity with Linux-based environments and scripting
Good communication skills and ability to work collaboratively