Data Scientist (IFE DCC)

THALES SOLUTIONS ASIA PTE. LTD.

Singapore
Distributed data processing with spark
Etl pipeline design and implementation
Sql and python proficiency
The role involves designing and maintaining efficient ETL pipelines to process large-scale data from multiple sources in batch and near real-time environments

Job Summary

  • The role involves designing and maintaining efficient ETL pipelines to process large-scale data from multiple sources in batch and near real-time environments.
  • Candidates will leverage modern platforms like Databricks and Spark to build scalable data architectures and implement robust data models for analytics.
  • The position requires driving the exploration of AI/ML use cases and enabling production-grade data-driven intelligence across various platforms.

Matching Summary

Match Score: 75

The role involves designing and maintaining efficient ETL pipelines to process large-scale data from multiple sources in batch and near real-time environments.

Skills & Requirements

Must-have

  • Distributed data processing with Spark
  • ETL pipeline design and implementation
  • SQL and Python proficiency
  • Data modeling and warehousing concepts
  • Cloud platform experience (AWS or Azure)
  • Delta Lake and Databricks workflows

Nice-to-have

  • Kubernetes and containerized workloads
  • Delta Sharing for secure data access
  • MLflow and LLM framework integration
  • Vector database experience
  • Databricks certification preferred
  • Python Institute certification preferred

Key Requirements

  • Strong background in analyzing complex datasets
  • Experience with Databricks ecosystem highly desirable
  • Proficiency in SQL and Python required
  • Understanding of lakehouse architecture needed
  • Relevant certifications are a plus

Work Rights

Not specified

Tailored Resume

Cover Letter