Flow matching — Euler sampling
A trained vθ gives you an ODE. Solve it. The step count is the whole story.
The ODE
A trained velocity field defines an ordinary differential equation: the trajectory of a particle that starts at x0 ∼ N(0, I) and rides the field. The IVP is
By construction (lesson 5), the resulting x(1) is a sample from pdata. To produce a batch of samples, integrate one ODE per sample — in batched matrix form, that’s one tensor of shape (N, D) updated in lockstep.
Forward Euler — the simplest integrator
Forward Euler turns the ODE into one update line:
For K steps total, that’s K evaluations of vθ. FlowMatching.sample:
x = randn(n, *shape)
dt = 1.0 / steps
for k in range(steps):
t = full((n,), k * dt)
x = x + dt * self.model(x, t)
return x
Three lines. This is the payoff for the design effort — the sampler is trivial because the path is straight.
Why few steps suffice (for FM specifically)
Forward Euler hops off the true trajectory if the trajectory is curved. The size of that off-path drift depends on the second derivative ẍ = ∂tv + v · ∇xv — the curvature of the integral curves.
For the linear path, each conditional trajectory is exactly a line, so the conditional curvature is zero. The marginal trajectories (the curves you actually integrate at test time) have small curvature inherited from this. Empirically:
| Solver | Steps (typical) | FID on CIFAR-10 (Lipman 2023) |
|---|---|---|
| FM Euler | 20–50 | < 5 |
| FM RK4 | 10 | < 5 |
| DDPM ancestral | 1000 | < 5 |
| DDIM | 50–100 | < 5 with caveats |
Roughly: same quality, ~20× fewer network calls.
Interactive · Euler on a learnable toy velocity field
The widget below trains a tiny MLP vθ on the two moons (50 seconds in JS), then integrates with Euler at K steps. Watch what happens at K = 2, 5, 50. With a straight path you see usable two moons even at K = 5.
3D · trajectories of your trained model
The widget above renders only the final samples. Here’s what the integrator is actually doing: 30 particles followed step by step through (x, y, t) space, with t as the third axis. Train the model in the widget above, then come here and hit render — straight-ish lines from a Gaussian cloud at t = 0 to the two moons at t = 1. The straighter the lines, the fewer integration steps needed.
RK4 in one line
If you want lower error per step at the cost of more network calls, use a higher-order Runge-Kutta. RK4 evaluates vθ four times per step:
k1 = v(x, t)
k2 = v(x + dt/2 * k1, t + dt/2)
k3 = v(x + dt/2 * k2, t + dt/2)
k4 = v(x + dt * k3, t + dt)
x = x + dt/6 * (k1 + 2*k2 + 2*k3 + k4)
Global error: O(dt4). Quality gain isn’t free — 4× the compute — but for the same total budget you often want fewer-but-better steps. The widget above lets you switch.
The bigger picture: when to pick what
Common gotchas
- Integrating the wrong direction. Training is x0 → x1. Sampling starts at x0 and integrates forward in t. If you accidentally start at x1 and integrate backward, the field will push you towards N(0, I) — looks like the model learned to noise, not denoise. This is the single most common bug.
- Off-by-one in the time index. The k-th step uses t = k · dt, evaluated before the update. Putting t = (k+1) · dt is a half-step error that compounds.
- Forgetting to scale t for the embedding. Sinusoidal time embeddings are tuned for an integer range like [0, 1000]. With t ∈ [0, 1] the high-frequency entries collapse — solution is the
t * 1000trick inMLPVelocity.forward. Lesson 8 talks about this for DiT.