Synthetic Data Engineer (ai Data/training)

Hyphen Partners

San Francisco Bay Area, United States
**
Synthetic data generation pipelines
Self-instruct and constitutional prompting
Automated quality scoring systems
** Hyphen Partners is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines aimed at enhancing data management for model training. The ideal candidate should have a strong background in building large-scale data pipelines and knowledge of prompt engineering. **

Job Summary

  • The role involves designing domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting techniques.
  • Candidates will implement automated quality scoring and de-duplication systems to ensure high-quality data for training loops.
  • This position manages critical data pipelines that feed directly into SFT and DPO training processes within the organization.

Matching Summary

Match Score: 75

** Hyphen Partners is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines aimed at enhancing data management for model training. The ideal candidate should have a strong background in building large-scale data pipelines and knowledge of prompt engineering. **

Skills & Requirements

Must-have

  • Synthetic data generation pipelines
  • Self-instruct and constitutional prompting
  • Automated quality scoring systems
  • De-duplication systems implementation
  • SFT and DPO training loop management

Nice-to-have

  • Dataset distillation knowledge
  • Bias mitigation familiarity
  • Innovative problem solving skills
  • Domain-specific expertise application

Key Requirements

  • Experience with Airflow, Spark, or Ray
  • Deep prompt engineering knowledge
  • Large-scale data pipeline building experience

Work Rights

Not specified

Tailored Resume

Cover Letter