Staff Ai Engineer, Model Post-training And Alignment
OKX
APAC
On-site
Large model post-training pipelines
Preference learning and alignment techniques
Reinforcement learning from ai feedback
This role focuses on designing, executing, and optimizing post-training pipelines to improve model performance, controllability, domain adaptation, and reasoning capabilities
Job Summary
This role focuses on designing, executing, and optimizing post-training pipelines to improve model performance, controllability, domain adaptation, and reasoning capabilities.
You will work across the full lifecycle of post-training—from data strategy and reward modeling to reinforcement learning–based optimization and production-grade inference deployment.
OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps).
Matching Summary
This role focuses on designing, executing, and optimizing post-training pipelines to improve model performance, controllability, domain adaptation, and reasoning capabilities.
Skills & Requirements
Must-have
large model post-training pipelines
preference learning and alignment techniques
Reinforcement Learning from AI Feedback
low-latency serving frameworks
domain-specific data strategies
Nice-to-have
friendly, rewarding, and diverse environment
contribute to every individual's freedom
Key Requirements
8 years of industry experience
Bachelor's in Computer Science, AI, Machine Learning
DPO, GRPO, and RL-based post-training methodologies