ML Systems Design, from first principles
The other tracks teach mechanisms — FlashAttention, FSDP, RadixAttention, GRPO. This one teaches the skill that sits on top of them: given a workload and a cluster, design the system. Every lesson runs the same loop — requirements → arithmetic → topology → bottleneck → iterate — and every choice is justified in bytes, FLOPs, and dollars.
The one method, applied again and again
"System design" sounds like a grab-bag of opinions. It isn't. Every design in this series is the same five-step loop, run until the numbers stop moving. The art is only in which constraint binds first — and that is always a number you can estimate before you write any code.
This is what "linearized" means here: you never reach for a mechanism (paging, sharding, speculative decoding) until the arithmetic has named the bottleneck it removes. Optimizations are answers to measured questions, never a checklist.
Part I · The method and its numbers
Part II · Inference system design
Part III · Training system design
Part IV · RL post-training system design
Part V · The MLE lifecycle
Part VI · Synthesis
Part VII · Design case studies
Six self-contained worked designs. Each takes the same loop to a workload whose binding wall is different — so the resulting system looks different even though the model is ordinary. Read in any order; together they're the proof that the method generalizes.
How to use this
- Read 01–03b in order, no skipping. They are the method. Every later lesson assumes the design loop and the napkin-math numbers; the arithmetic is load-bearing, not decoration.
- Do the back-of-the-envelope before reading the answer. Each design lesson states the requirements, then pauses. Estimate the GPU count yourself, then check. The gap between your guess and the number is the lesson.
- Touch the widgets. Each has one knob whose extreme setting flips the binding constraint — memory becomes bandwidth, latency-bound becomes throughput-bound. Finding that flip is the design intuition.
- This track points down at the mechanism tracks. When a lesson decides "shard with tensor parallel here," the how lives in System ML 06. This series is the why-and-when; follow the links for the how.
- Then drill with Part VII (13–18). After the capstone, the six case studies apply the loop cold to real products. Predict the binding wall before you read each one — that prediction is the skill the whole track is building.