Generative Models, From First Principles
Part A: DDPM and Flow Matching on continuous data. Part B: discrete tokens, VQ-VAEs, masked diffusion, and the integrated reasoning + image-generation systems (Nano Banana Pro, GPT-Image-2 family) built on top.
This series is the companion to generative_continuous/: four models on the same axis cross — {DDPM, Flow Matching} × {MLP, DiT}. Each lesson takes one knob in that grid, derives it from scratch, shows you the trade-off it forces, and gives you a widget where you can drive it until it breaks.
.py files in this folder and say why it is there.
The one picture
Both DDPM and Flow Matching pose the same problem: build a path of distributions from a tractable prior to data, then learn a local operator that pushes probability mass along the path.
DDPM’s answer: the path is a Gaussian noising chain, the net predicts the noise that was added. Flow Matching’s answer: the path is whatever you like, the net predicts the velocity that moves mass along it. Despite the cosmetics, these are the same object viewed from two angles — DDPM is flow matching with a specific curved path. Lesson 7 makes that explicit.
Part A · Continuous data — the math that underlies everything (lessons 01–09)
Part B · Discrete tokens & integrated systems (lessons 10–15)
Once you can generate continuous things, you can compress images into a sequence of discrete tokens and let an LLM model them directly. That move — tokenize the image — is what makes “native multimodal” reasoning models possible: a single transformer that consumes and produces both text and image tokens, with chain-of-thought reasoning before image generation. Nano Banana Pro, the GPT-Image-2 family, Chameleon, and JanusFlow are all instances. This part covers the tokenizer, the discrete-generation algorithm, the unified architecture, and the reasoning layer on top.
How to use this
- Linearly. Each lesson assumes the previous one’s vocabulary. Lessons 02–04 build DDPM; 05–06 build FM; 07 connects them; 08–09 talk architecture and practice.
- Touch every knob. Each widget has a configuration that produces visibly wrong samples or visibly fails to integrate. Find it — the failure is the lesson.
- Open the code. Every claim corresponds to lines you can read in
diffusion.py,flow_matching.py, ordiffusion_transformer.py. The lessons explain why; the code is what.