The role focuses on developing cutting-edge systems for Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO).
Not specified; Not specified; Not specified
Must-have
Nice-to-have
Not specified