Posts Tagged "RLHF"

verl, vLLM, and FlashAttention: How the Stack Actually Fits Together

A practical guide to what verl, vLLM, and FlashAttention each do, why they appear in the same post-training setup, and where their responsibilities actually differ.

GRPO and Its Variants: What Actually Changes, and Why It Matters

A long-form engineering guide to GRPO, Dr.GRPO, DAPO, BNPO, REINFORCE++, RLOO, and newer trainer variants, with an emphasis on normalization, length bias, and practical training choices for LLM RL.