Posts Tagged "LLM"

March 18 Group Meeting Notes: Agentic Vending, Pre-Training Data, and Constitutional Classifiers

Cleaned-up listener notes from a March 18 group meeting hosted by my professor, covering Project Vend 2, data poisoning, token-level filtering, duplication, replayed pre-training data, and Anthropic's next-generation constitutional classifiers.

GRPO and Its Variants: What Actually Changes, and Why It Matters

A long-form engineering guide to GRPO, Dr.GRPO, DAPO, BNPO, REINFORCE++, RLOO, and newer trainer variants, with an emphasis on normalization, length bias, and practical training choices for LLM RL.