Home
Writing
Series
Projects
About

Series

Longer arguments built across several posts.

7 parts

How LLMs learn to reason

The RL lineage behind reasoning models, traced one algorithm at a time, from REINFORCE and PPO through GRPO and DPO to what R1 actually shipped.

5 parts

Research foundations of modern LLMs

What modern LLMs inherited from earlier NLP research, and why those ideas reorganized into today’s stack instead of disappearing.

2 parts

What an agent actually is

A first-principles definition of an AI agent: control loops, tools, state, and decisions, and what agent frameworks are really doing underneath their abstractions.

5 parts

The agent harness

Why agent demos break in production, and why the harness, the state, gates, traces, verification, and engineering around the model, is often the actual product.

7 parts

LLM evaluation, honestly

Why your eval scores look good while the system stays unreliable, and what to measure instead.

No matching items

Built with Quarto