Synthetic Data Engineer (ai Data/training)

Hyphen Connect

Oregon, United States
**
Design domain-specific synthetic data pipelines
Implement automated quality scoring systems
Manage sft and dpo training loop data
** Hyphen Connect is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines to support data processing and model training. The ideal candidate should have experience with large-scale data pipelines and knowledge of prompt engineering and bias mitigation. **

Job Summary

  • The role focuses on designing domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
  • Candidates will implement automated systems for quality scoring and de-duplication to ensure high-quality data management.
  • This position is critical for managing data pipelines that directly feed into Supervised Fine-Tuning and Direct Preference Optimization training loops.

Matching Summary

Match Score: 75

** Hyphen Connect is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines to support data processing and model training. The ideal candidate should have experience with large-scale data pipelines and knowledge of prompt engineering and bias mitigation. **

Skills & Requirements

Must-have

  • design domain-specific synthetic data pipelines
  • implement automated quality scoring systems
  • manage SFT and DPO training loop data

Nice-to-have

  • experience with dataset distillation techniques
  • knowledge of bias mitigation strategies
  • proficiency in self-instruct prompting methods

Key Requirements

  • Proven experience building large-scale data pipelines
  • Deep knowledge of prompt engineering
  • Familiarity with Airflow, Spark, and Ray

Work Rights

Not specified

Tailored Resume

Cover Letter