Posts Tagged "RLHF"
verl, vLLM, and FlashAttention: How the Stack Actually Fits Together
A practical guide to what verl, vLLM, and FlashAttention each do, why they appear in the same post-training setup, and where their responsibilities actually differ.
Read Post
GRPO and Its Variants: What Actually Changes, and Why It Matters
A long-form engineering guide to GRPO, Dr.GRPO, DAPO, BNPO, REINFORCE++, RLOO, and newer trainer variants, with an emphasis on normalization, length bias, and practical training choices for LLM RL.
Read Post