Senior Data Engineer / Data Curator

TSMC Arizona

Phoenix, Arizona, US
On-site
Python (pandas, numpy)
Sql for data manipulation
Cloud data storage (aws s3, gcs)
You will play a key role in designing and maintaining scalable data pipelines, ensuring that data is clean, relevant, and aligned with ethical and compliance standards

Job Summary

  • You will play a key role in designing and maintaining scalable data pipelines, ensuring that data is clean, relevant, and aligned with ethical and compliance standards.
  • Assess and mitigate bias in datasets, ensuring that models are trained on diverse and representative data.
  • We offer a comprehensive and competitive benefits program that provides the resources you need to help you manage your health and achieve your goals across many areas of your life.

Matching Summary

You will play a key role in designing and maintaining scalable data pipelines, ensuring that data is clean, relevant, and aligned with ethical and compliance standards.

Skills & Requirements

Must-have

  • Python (Pandas, NumPy)
  • SQL for data manipulation
  • Cloud data storage (AWS S3, GCS)
  • Data annotation tools
  • NLP data formats (JSONL, text, embeddings)
  • Data pipeline management (Kafka, Airflow)
  • AI ethics and data privacy

Nice-to-have

  • Vector databases and indexing for LLMs
  • Communication skills
  • Presentation skills
  • Teamwork

Key Requirements

  • Bachelor's degree in Computer Science, Data Science, or related field
  • 5+ years of experience in data engineering, data wrangling, or data curation
  • Experience with distributed systems
  • Understanding of tokenization
  • Experience with GDPR, CCPA compliance

Work Rights

Not specified

Tailored Resume

Cover Letter