7 parts
How LLMs learn to reason
The RL lineage behind reasoning models, traced one algorithm at a time, from REINFORCE and PPO through GRPO and DPO to what R1 actually shipped.
The RL lineage behind reasoning models, traced one algorithm at a time, from REINFORCE and PPO through GRPO and DPO to what R1 actually shipped.
What modern LLMs inherited from earlier NLP research, and why those ideas reorganized into today’s stack instead of disappearing.
A first-principles definition of an AI agent: control loops, tools, state, and decisions, and what agent frameworks are really doing underneath their abstractions.
Why agent demos break in production, and why the harness, the state, gates, traces, verification, and engineering around the model, is often the actual product.
Why your eval scores look good while the system stays unreliable, and what to measure instead.