Hands-on experience with python for data processing
Strong working knowledge of pyspark and distributed processing
Proven hands-on experience using databricks for data engineering
The role involves developing, testing, and maintaining data pipelines using Databricks, PySpark, and Python to support analytics and machine learning initiatives
Job Summary
The role involves developing, testing, and maintaining data pipelines using Databricks, PySpark, and Python to support analytics and machine learning initiatives.
Candidates will work with cross-functional teams in an Agile environment to deliver reliable datasets and optimize Spark jobs for performance and cost efficiency.
This position requires supporting basic AI/ML data preparation activities while ensuring data quality through cleansing, validation, and rigorous testing protocols.
Matching Summary
The role involves developing, testing, and maintaining data pipelines using Databricks, PySpark, and Python to support analytics and machine learning initiatives.
Skills & Requirements
Must-have
Hands-on experience with Python for data processing
Strong working knowledge of PySpark and distributed processing
Proven hands-on experience using Databricks for data engineering
Ability to build and troubleshoot scalable ETL/ELT pipelines
Experience working with Delta Lake and lakehouse architecture
Working knowledge of SQL for querying and transforming data
Nice-to-have
Exposure to Unity Catalog or advanced Lakehouse architecture
Familiarity with CI/CD practices for data engineering projects
Experience with workflow orchestration tools or Databricks Jobs
Exposure to machine learning workflows using MLflow or scikit-learn
Experience with Tableau or Power BI for data visualization
Understanding of data governance, security, and access control concepts
Key Requirements
2-6 years of professional experience required
Bachelor's degree in Computer Science or related field
Equivalent practical experience accepted in lieu of degree