Synthetic Data Engineer (ai Data/training)

Hyphen Connect

Singapore, Singapore
On-site
Design domain-specific synthetic data pipelines
Implement automated quality scoring systems
Manage data pipelines for sft and dpo training
Hyphen Connect is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines, focusing on high-quality data management for model training. The ideal candidate will have experience in building data pipelines and expertise in prompt engineering, with a strong emphasis on automated quality systems

Job Summary

  • The role focuses on designing domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
  • Candidates will implement automated systems for quality scoring and de-duplication to ensure high-quality data for training.
  • This position is critical for managing data pipelines that directly feed into Supervised Fine-Tuning and Direct Preference Optimization training loops.

Matching Summary

Match Score: 85

Hyphen Connect is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines, focusing on high-quality data management for model training. The ideal candidate will have experience in building data pipelines and expertise in prompt engineering, with a strong emphasis on automated quality systems.

Skills & Requirements

Must-have

  • design domain-specific synthetic data pipelines
  • implement automated quality scoring systems
  • manage data pipelines for SFT and DPO training

Nice-to-have

  • experience with dataset distillation techniques
  • knowledge of bias mitigation strategies
  • innovative approach to data management

Key Requirements

  • Proven experience building large-scale data pipelines
  • Deep knowledge of prompt engineering
  • Familiarity with Airflow, Spark, or Ray

Work Rights

Not specified

Tailored Resume

Cover Letter