Lead Data Engineer (databricks/pyspark)

Capgemini

Dallas, TX, US
Base: 99,712 - 168,716; bonus/equity: not specifie...
On-site
Databricks pyspark sql delta lake
Credit card domain data
Spark job optimization
Design and implement scalable ETL pipelines on Databricks PySpark SQL Delta Lake to process credit card transactions balances and payments

Job Summary

  • Design and implement scalable ETL pipelines on Databricks PySpark SQL Delta Lake to process credit card transactions balances and payments.
  • Optimize Spark jobs for large-scale financial datasets billions of records partitioning caching AQE.
  • Capgemini offers a comprehensive, non-negotiable benefits package to all regular, full-time employees.

Matching Summary

Design and implement scalable ETL pipelines on Databricks PySpark SQL Delta Lake to process credit card transactions balances and payments.

Salary

Base: 99,712 - 168,716; Bonus/Equity: Not specified; Benefits: Comprehensive package

Skills & Requirements

Must-have

  • Databricks PySpark SQL Delta Lake
  • credit card domain data
  • Spark job optimization
  • data quality and reconciliation
  • CICD pipelines and monitoring

Nice-to-have

  • collaboration with business analysts
  • financial data governance

Key Requirements

  • 6-10 years of experience in Data Engineering
  • Experience with Databricks clusters notebooks Delta Live Tables Unity Catalog
  • Experience with workflow orchestration (Airflow, Databricks Workflows, Dagster)
  • Knowledge of CICD (Bitbucket/GitHub, Jenkins, Terraform)

Work Rights

Not specified

Tailored Resume

Cover Letter