The role involves designing and maintaining efficient ETL pipelines to process large-scale data from multiple sources in batch and near real-time environments
Job Summary
The role involves designing and maintaining efficient ETL pipelines to process large-scale data from multiple sources in batch and near real-time environments.
Candidates will leverage modern platforms like Databricks and Spark to build scalable data architectures and implement robust data models for analytics.
The position requires driving the exploration of AI/ML use cases and enabling production-grade data-driven intelligence across various platforms.
Matching Summary
Match Score: 75
The role involves designing and maintaining efficient ETL pipelines to process large-scale data from multiple sources in batch and near real-time environments.
Skills & Requirements
Must-have
Distributed data processing with Spark
ETL pipeline design and implementation
SQL and Python proficiency
Data modeling and warehousing concepts
Cloud platform experience (AWS or Azure)
Delta Lake and Databricks workflows
Nice-to-have
Kubernetes and containerized workloads
Delta Sharing for secure data access
MLflow and LLM framework integration
Vector database experience
Databricks certification preferred
Python Institute certification preferred
Key Requirements
Strong background in analyzing complex datasets
Experience with Databricks ecosystem highly desirable