Synthetic Data Engineer (ai Data/training)

Hyphen Connect

China
On-site
Synthetic data generation pipelines
Self-instruct and constitutional prompting
Automated quality scoring systems
Hyphen Connect is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines, ensuring high-quality data management for model training. The ideal candidate will have extensive experience in building large-scale data pipelines and knowledge of prompt engineering and bias mitigation

Job Summary

  • The role involves designing domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting techniques.
  • Candidates will implement automated quality scoring and de-duplication systems to ensure high-quality data management.
  • The position requires managing data pipelines that directly feed into SFT and DPO training loops within the organization.

Matching Summary

Match Score: 85

Hyphen Connect is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines, ensuring high-quality data management for model training. The ideal candidate will have extensive experience in building large-scale data pipelines and knowledge of prompt engineering and bias mitigation.

Skills & Requirements

Must-have

  • Synthetic data generation pipelines
  • Self-instruct and constitutional prompting
  • Automated quality scoring systems
  • SFT and DPO training loop integration
  • Large-scale data pipeline tools

Nice-to-have

  • Dataset distillation expertise
  • Bias mitigation strategies
  • Innovative problem solving skills

Key Requirements

  • Experience with Airflow, Spark, or Ray
  • Deep knowledge of prompt engineering
  • Familiarity with dataset distillation

Work Rights

Not specified

Tailored Resume

Cover Letter