Staff Ai Engineer, Model Post-training And Alignment

OKX

APAC
On-site
Large model post-training pipelines
Preference learning and alignment techniques
Reinforcement learning from ai feedback
This role focuses on designing, executing, and optimizing post-training pipelines to improve model performance, controllability, domain adaptation, and reasoning capabilities

Job Summary

  • This role focuses on designing, executing, and optimizing post-training pipelines to improve model performance, controllability, domain adaptation, and reasoning capabilities.
  • You will work across the full lifecycle of post-training—from data strategy and reward modeling to reinforcement learning–based optimization and production-grade inference deployment.
  • OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps).

Matching Summary

This role focuses on designing, executing, and optimizing post-training pipelines to improve model performance, controllability, domain adaptation, and reasoning capabilities.

Skills & Requirements

Must-have

  • large model post-training pipelines
  • preference learning and alignment techniques
  • Reinforcement Learning from AI Feedback
  • low-latency serving frameworks
  • domain-specific data strategies

Nice-to-have

  • friendly, rewarding, and diverse environment
  • contribute to every individual's freedom

Key Requirements

  • 8 years of industry experience
  • Bachelor's in Computer Science, AI, Machine Learning
  • DPO, GRPO, and RL-based post-training methodologies
  • training specialized small models from scratch
  • vLLM, SGLang, or similar

Work Rights

Not specified

Tailored Resume

Cover Letter