You will play a key role in designing and maintaining scalable data pipelines, ensuring that data is clean, relevant, and aligned with ethical and compliance standards
Job Summary
You will play a key role in designing and maintaining scalable data pipelines, ensuring that data is clean, relevant, and aligned with ethical and compliance standards.
Assess and mitigate bias in datasets, ensuring that models are trained on diverse and representative data.
We offer a comprehensive and competitive benefits program that provides the resources you need to help you manage your health and achieve your goals across many areas of your life.
Matching Summary
You will play a key role in designing and maintaining scalable data pipelines, ensuring that data is clean, relevant, and aligned with ethical and compliance standards.
Skills & Requirements
Must-have
Python (Pandas, NumPy)
SQL for data manipulation
Cloud data storage (AWS S3, GCS)
Data annotation tools
NLP data formats (JSONL, text, embeddings)
Data pipeline management (Kafka, Airflow)
AI ethics and data privacy
Nice-to-have
Vector databases and indexing for LLMs
Communication skills
Presentation skills
Teamwork
Key Requirements
Bachelor's degree in Computer Science, Data Science, or related field
5+ years of experience in data engineering, data wrangling, or data curation