Flow matching — Euler sampling

A trained v_θ gives you an ODE. Solve it. The step count is the whole story.

The ODE

A trained velocity field defines an ordinary differential equation: the trajectory of a particle that starts at x₀ ∼ N(0, I) and rides the field. The IVP is

dx/dt = v_θ(x, t), x(0) = x₀ ∼ N(0, I), t ∈ [0, 1].

By construction (lesson 5), the resulting x(1) is a sample from p_data. To produce a batch of samples, integrate one ODE per sample — in batched matrix form, that’s one tensor of shape (N, D) updated in lockstep.

Forward Euler — the simplest integrator

Forward Euler turns the ODE into one update line:

x_{t + dt} = x_t + dt · v_θ(x_t, t), dt = 1 / K.

For K steps total, that’s K evaluations of v_θ. FlowMatching.sample:

x = randn(n, *shape)
dt = 1.0 / steps
for k in range(steps):
    t = full((n,), k * dt)
    x = x + dt * self.model(x, t)
return x

Three lines. This is the payoff for the design effort — the sampler is trivial because the path is straight.

Forward Euler error, in one line

Per-step error is O(dt²) (Taylor expansion: x(t+dt) = x(t) + dt · ẋ + dt²/2 · ẍ + …). Global error is O(dt) = O(1/K). So halving K doubles the error.

Intuition · linear unpacking

Claim: each Euler step is wrong by O(dt²), yet the error you actually see at the end is only O(dt) = O(1/K).

One step assumes the velocity holds still. Euler reads the field once, at the start of the step, then walks in a straight line for the whole interval dt. But the true velocity keeps changing as you move — that’s the dt²/2 · ẍ term Taylor warns you about. The first term you drop is quadratic in dt, so one step is off by O(dt²).
But you take many steps. To cross t ∈ [0, 1] you stitch together K = 1/dt of these little straight hops.
Errors add up, so multiply. Roughly 1/dt steps, each costing O(dt²), gives a total of (1/dt) · dt² = O(dt) = O(1/K). One power of dt cancels.

Central point. The end-to-end error shrinks only as fast as 1/K — linearly in the number of network calls — so doubling the steps roughly halves the error, and halving the steps roughly doubles it.

Why few steps suffice (for FM specifically)

Here is the intuition before the symbols: Euler walks in straight lines, so it only goes wrong where the true path bends. A perfectly straight trajectory can be traced exactly in a single step; a sharply curving one needs many tiny steps to follow the bend. So the question “how many steps do I need?” is really the question “how curved are the paths I’m integrating?” Formally, the off-path drift Euler accumulates is governed by the second derivative ẍ = ∂_tv + v · ∇_xv — the curvature of the integral curves. Big ẍ means lots of bending, which means lots of steps; small ẍ means you can get away with few.

And flow matching is built to make ẍ small. For the linear path, each conditional trajectory is exactly a line, so the conditional curvature is zero. The marginal trajectories (the curves you actually integrate at test time) have small curvature inherited from this. Empirically:

Intuition · linear unpacking

Claim: FM needs ~20× fewer steps than DDPM because the paths it asks you to follow are nearly straight.

Euler’s only enemy is curvature. A straight-line step is a perfect approximation of a straight path and a bad one of a bending path. The more a trajectory curves, the smaller the steps you need to stay on it.
The training recipe forces the building blocks to be straight. Flow matching defines each conditional path as x_t = (1−t)·x₀ + t·x₁ — a literal straight line from noise to a data point. A straight line has zero curvature, so for those conditional paths there is nothing for Euler to miss.
The paths you actually integrate inherit that straightness. At test time you ride the marginal field (the averaged-together version), which can bend a little where many straight conditional paths cross. But it starts from a stack of perfectly straight ingredients, so the bending stays mild.
Mild bending ⇒ few steps. Since steps-needed tracks curvature, low curvature cashes out directly as a low step count — the table below: comparable quality at roughly 1/20th the network calls of DDPM.

Central point. DDPM’s sampling trajectories are curved (variance-preserving paths bow toward the origin), so it needs hundreds of steps; FM deliberately builds straight paths, so a handful of Euler steps already lands on the data.

Solver	Steps (typical)	FID on CIFAR-10 (Lipman 2023)
FM Euler	20–50	< 5
FM RK4	10	< 5
DDPM ancestral	1000	< 5
DDIM	50–100	< 5 with caveats

Roughly: same quality, ~20× fewer network calls.

Interactive · Euler on a learnable toy velocity field

The widget below trains a tiny MLP v_θ on the two moons (50 seconds in JS), then integrates with Euler at K steps. Watch what happens at K = 2, 5, 50. With a straight path you see usable two moons even at K = 5.

Train v_θ in your browser, then integrate

Tiny MLP, trained on the two moons via the CFM loss. Hit train once; then explore K. Compare with the DDPM ancestral widget in lesson 4 — both reach the same destination, FM with far fewer net calls.

K (sampling steps): 20 solver:

training step

loss (EMA)

—

samples drawn

solver order

3D · trajectories of your trained model

The widget above renders only the final samples. Here’s what the integrator is actually doing: 30 particles followed step by step through (x, y, t) space, with t as the third axis. Train the model in the widget above, then come here and hit render — straight-ish lines from a Gaussian cloud at t = 0 to the two moons at t = 1. The straighter the lines, the fewer integration steps needed.

RK4 in one line

If you want lower error per step at the cost of more network calls, use a higher-order Runge-Kutta. RK4 evaluates v_θ four times per step:

k1 = v(x,            t)
k2 = v(x + dt/2 * k1, t + dt/2)
k3 = v(x + dt/2 * k2, t + dt/2)
k4 = v(x + dt   * k3, t + dt)
x  = x + dt/6 * (k1 + 2*k2 + 2*k3 + k4)

Global error: O(dt⁴). Quality gain isn’t free — 4× the compute — but for the same total budget you often want fewer-but-better steps. The widget above lets you switch.

The bigger picture: when to pick what

Solver heuristics

Situation	Recommended solver	Why
Straight linear-path FM, tight compute budget	Euler with K=20-50	cheap, paths are mostly straight
Curved path (VP-like), few-step budget	RK4 or DPM-Solver++	amortize the per-step error
Reproducibility / deterministic latents	RK4 (any FM solver is deterministic)	same x₀ ⇒ same x₁; useful for latent editing
Want stochasticity in samples	DDPM ancestral or SDE-flavored solver	diversity from the inner Gaussian, not from the integrator

Common gotchas

Integrating the wrong direction. Training is x₀ → x₁. Sampling starts at x₀ and integrates forward in t. If you accidentally start at x₁ and integrate backward, the field will push you towards N(0, I) — looks like the model learned to noise, not denoise. This is the single most common bug.
Off-by-one in the time index. The k-th step uses t = k · dt, evaluated before the update. Putting t = (k+1) · dt is a half-step error that compounds.
Forgetting to scale t for the embedding. Sinusoidal time embeddings are tuned for an integer range like [0, 1000]. With t ∈ [0, 1] the high-frequency entries collapse — solution is the t * 1000 trick in MLPVelocity.forward. Lesson 8 talks about this for DiT.

Punchline

Sampling is one for-loop. The cost (K forward passes) scales linearly with desired quality. Linear-path FM gives you the best constant on that scaling. Higher-order solvers buy you more per step at the cost of more per step — the right trade depends on the network cost vs. desired quality.