Posts Tagged "Reinforcement Learning"
GRPO and Its Variants: What Actually Changes, and Why It Matters
A long-form engineering guide to GRPO, Dr.GRPO, DAPO, BNPO, REINFORCE++, RLOO, and newer trainer variants, with an emphasis on normalization, length bias, and practical training choices for LLM RL.
Read Post
PPO Is Not Just a Clip Trick
Why the practical success of PPO comes from the whole implementation stack rather than the clipping term alone.
Read Post