Part II - Core execution patterns
Prompt chaining - linear decomposition
Lesson 02 made the prompt a typed contract: a single model call with a scoped input shape and a validated output shape. But many real tasks are too big for one call to satisfy reliably. This lesson takes the first step into control flow — it strings several of those typed calls into an ordered pipeline, where each artifact narrows the job of the next. This is the simplest possible agent topology: a straight line.
New capability: Composing several of those calls into a fixed, linear sequence, so that a task too large for one prompt becomes a series of small, individually inspectable transformations.
1 · Why one prompt is not enough — the reliability argument
Start with the failure the book leads with. Suppose you ask a model, in a single prompt: "Analyze this market-research report, summarize the findings, extract the three most important data points, and draft an email to the marketing team about them." That is four distinct jobs welded together. A capable model will often do most of it — and quietly drop one part. It summarizes well, extracts two of three data points, and writes an email that omits the numbers. Nothing errored. The output looks fluent. It is just incomplete.
The book names the mechanisms behind this: instruction neglect (the model skips part of a multi-part request), context drift (it loses track of an earlier constraint by the time it reaches a later one), error accumulation (a small mistake early gets built upon), context-window pressure (the combined input/output is too long to hold cleanly), and hallucination under cognitive load. The unifying intuition: a single call has to satisfy all constraints simultaneously, and the more constraints you pile onto one inference, the lower the chance every one of them is met.
Here is the argument made numeric, because it is the load-bearing claim of the entire pattern. Model a complex task as n independent sub-requirements, each of which a single overloaded call gets right with probability p. The chance the one call nails all of them is the product:
Worked number. Take n = 4 sub-tasks (summarize, extract point A, extract point B, draft email) and a generous per-requirement success p = 0.85 when they all share one crowded prompt. Then Psingle = 0.854 ≈ 0.52. Roughly a coin flip that the whole thing is correct — and notice the model still sounds confident on every failed run. Now decompose into a chain of focused steps. Each step does one thing, so per-step reliability rises (call it p' = 0.95), and — critically — you can validate the artifact between steps and re-run a failed step, raising its effective success higher still. Even without retries:
Going from 0.52 to 0.81 is the whole game, and the widget below lets you feel exactly how the two curves diverge as the task grows. The deeper point: a chain does not just raise p; it gives you checkpoints. Between two model calls you can run plain code — validate a schema, retry, branch, or call a deterministic tool — which a single monolithic call can never expose.
2 · The pattern: an ordered sequence of typed transformations
Formally, a prompt chain is a pipeline. The book frames it as a divide-and-conquer strategy: instead of solving a hard problem in one shot, you decompose it into a sequence of smaller sub-problems, each handled by a purpose-built prompt, with the output of step k becoming the input of step k+1. That dependency chain is the defining feature — earlier context and results steer later processing, so the model refines its understanding step by step rather than guessing everything at once.
Think of it as a Unix pipeline or a sequence of pure functions: each stage takes a well-defined input, performs one transformation, and emits a well-defined output for the next stage. The mental model the book offers is the computational pipeline — a function completes its specific operation and hands the result downstream. Crucially, between two model calls you can interleave ordinary code: a validator, a database lookup, an arithmetic tool, a conditional. The chain is not "a list of prompts"; it is a list of transformations, some of which happen to be model calls.
The book also points out that a step is not obliged to be a model call at all. A chain step can instruct the model to call an external API, query a database, or run a tool — which is how chaining becomes the on-ramp to genuine agency. We will build proper tool use in lesson 07; for now the relevant fact is that the linear structure already lets deterministic code sit between inferences, doing the parts code does better than a model (parsing, validation, arithmetic).
3 · The handoff is the hard part — use structured output
Here is where chains actually break in practice, and the book is emphatic about it: the reliability of a chain depends entirely on the integrity of the data passed between steps. If step k emits something vague or loosely formatted, step k+1 receives a malformed input and fails — and because the failure is downstream, it is annoying to localize. The fix is to specify a structured output format (JSON or XML) at each handoff, so the artifact can be parsed and validated by machine rather than re-interpreted as free text.
Take the book's market-research chain. Step 2 (trend identification) should not emit a paragraph; it should emit a parseable object:
{
"trends": [
{
"trend_name": "AI-driven personalization",
"supporting_data": "73% of consumers prefer brands that use personal info to improve the shopping experience."
},
{
"trend_name": "Sustainability & word-of-mouth",
"supporting_data": "ESG-labeled products grew 28% over five years vs. 20% for unlabeled."
}
]
}
Now step 3 (draft the email) receives a typed list it can iterate over, not prose it has to re-read. Structured handoffs do three things at once: they make the artifact machine-parseable, they let you run a schema check between steps (the validation checkpoint from §1), and they shrink the surface area of natural-language ambiguity that causes drift. This is the through-line back to lesson 02 — the chain is just typed contracts composed end to end, where each step's output schema is the next step's input schema.
response.contains("{") check and then fail your real parser three steps later. Validate by actually parsing into the target schema at the boundary, and on failure, re-prompt the same step with the parse error as added context (the book's iterative "extract → check → re-prompt with failure context" loop), not the whole chain.4 · The running examples, stage by stage
The book gives several canonical chains. Two carry the rest of this track, so we make them concrete.
4a · Market-research chain (the book's headline example)
| Step | Role / prompt focus | Input | Output artifact |
|---|---|---|---|
| 1 · Summarize | "Summarize the key findings of this report" | raw report text | summary (prose, bounded length) |
| 2 · Identify trends | role: market analyst; "extract 3 trends + supporting data" | summary | {trends:[...]} (JSON) |
| 3 · Draft email | role: documentation specialist; "write a concise email for the marketing team" | {trends} | email (final text) |
Note the book's touch of assigning each step a distinct role ("market analyst", "trade analyst", "documentation specialist"). The role is part of the per-step contract: it narrows the model's behavior to the one job that step owns, which is part of how p rises to p'.
4b · Research/coding assistant chain (our track's running example)
This is the example threaded through the whole track. A research-and-code assistant answering "How did the 1929 crash happen and what policy response followed?" — or "implement and test this function" — decomposes naturally into a line:
| Step | Transformation | Artifact (typed) | Could be code/tool? |
|---|---|---|---|
| 1 · Decompose | turn the question into core sub-questions / requirements | {subquestions:[...]} | model |
| 2 · Retrieve | fetch sources / read the relevant files | {sources:[{id,text}]} | tool (retrieval) |
| 3 · Summarize each | condense each source to claims + citation | {claims:[{text,src_id}]} | model |
| 4 · Synthesize | merge claims into a coherent draft | draft | model |
| 5 · Verify citations | check every claim maps to a real source id | {ok:bool, bad:[...]} | code (deterministic) |
| 6 · Format final | render the answer with citations | answer | model |
The book makes a sharp observation about a research agent specifically: data gathering is often done in parallel (fetch many articles at once), but the later stages — merge, synthesize, review — are inherently sequential because each depends on the previous result. So a real research agent is a parallel fan-out feeding a chained tail. That is a clean foreshadow: parallelization is lesson 05, and it composes with chaining rather than replacing it.
The coding variant follows the same spine the book lists: (1) understand the requirement and emit pseudocode/outline, (2) write the first draft, (3) identify bugs or improvements (a static analyzer or a second model call), (4) rewrite/optimize, (5) add docs and tests. The win is identical — each model call faces a smaller, locally checkable job, and deterministic logic (a linter, a test runner) can sit between calls.
4c · The framework reality — LangChain LCEL
The book's runnable example is a two-step LangChain chain built with LCEL (the LangChain Expression Language), which uses the pipe operator to compose a prompt, a model, and an output parser. Step one extracts technical specs from free text; step two converts them to JSON. In spirit:
# step 1: prompt | model | parser -> spec text
extraction_chain = prompt_extract | llm | StrOutputParser()
# step 2: feed step-1 output as the 'specifications' variable into prompt 2
full_chain = (
{"specifications": extraction_chain}
| prompt_transform # "convert to JSON with keys cpu, memory, storage"
| llm
| StrOutputParser()
)
full_chain.invoke({"text_input": "New laptop: 3.5GHz 8-core CPU, 16GB RAM, 1TB NVMe SSD."})
The book is careful to separate the principle from the syntax: LangChain gives a linear-sequence abstraction; LangGraph adds stateful, cyclic computation for more complex agent behavior; CrewAI and Google's Agent Development Kit (ADK) provide structured environments for multi-step flows and roles. But chaining itself is framework-independent — at its barest it is sequential function calls in a script. Reach for a framework when you need managed state, retries, observability, and composition; reach for a plain loop when you do not.
5 · Engineering scaffolding: artifacts, validation, retries
The teaching idea the book keeps returning to: intermediate outputs are first-class artifacts. They can be inspected, validated, cached, retried, or swapped for a deterministic tool. That single shift — from "the answer" to "a sequence of stored, typed artifacts" — is what makes a chain debuggable and is the reason chaining, not single prompting, is where real agent engineering begins.
chain = [
decompose, # -> {subquestions}
retrieve, # -> {sources} (tool step)
summarize_sources, # -> {claims}
synthesize, # -> draft
verify_citations, # -> {ok, bad} (deterministic step)
format_final, # -> answer
]
state = State(task=user_task)
for step in chain:
artifact = step.run(state) # inputs read from state
ok, err = validate(artifact, step.schema) # parse into the target schema
if not ok:
artifact = retry(step, state, err, budget=2) # re-prompt with failure context
state.put(step.name, artifact) # persist; never overwrite, append
# every state.put is an inspectable, re-runnable checkpoint
Three properties fall out of this structure, and your checklist should verify all three: (1) every step is named by its transformation (classify, extract, synthesize) — if you cannot name it, it is doing more than one job; (2) every step reduces or reorganizes information toward the goal; (3) a failed step can be re-run from its stored inputs, because the previous artifacts were persisted. If a step instead requires information an earlier step already discarded, the chain is mis-decomposed — that is the canonical "when to avoid" signal.
Failure modes
- Prose handoffs. Passing free text between steps instead of a typed artifact; ambiguity compounds and step k+1 silently misreads step k.
- Hidden state. A chain of vague prompts where each step secretly depends on context that was never written down, so it cannot be re-run or tested in isolation.
- Over-chaining. Forcing every request through every step when most need only two — the rigid line wastes latency and tokens (routing, lesson 04, fixes this).
- End-only testing. Evaluating only the final answer, so you cannot tell which stage regressed when quality drops.
- Lost-information cuts. A late step needing detail an early step threw away — a sign the decomposition is wrong, not that you need a longer prompt.
Implementation checklist
- Can each step be named by a single transformation verb?
- Does each step have an explicit output schema, validated by parsing?
- Are handoffs structured (JSON/XML), not prose?
- Can a failed step re-run from stored inputs with the error as context?
- Is each intermediate artifact persisted and inspectable for debugging?
- Could any model step be replaced by deterministic code or a tool?
- Do you test stages individually, not just the final output?
Where this points next
A chain is a straight line: step 1 always leads to step 2 leads to step 3, for every input. That rigidity is its strength — it is the most predictable, most inspectable topology you can build — and also its ceiling. Real tasks are not all the same shape: a billing question and a code-review request should not traverse the same six stages. The moment the next step depends on what the input is, a fixed line cannot express it, and you need a decision in the control flow. That decision is the next pattern. Lesson 04, Routing — conditional control flow, adds exactly one primitive on top of the chain: a step whose output is not data but a choice of which step runs next. Chaining gives you the line; routing gives you the branch.
Interview prompts
- Why does splitting one prompt into a chain improve reliability? (§1 — a single call must satisfy all n sub-requirements at once, succeeding with probability pn; a chain makes each step smaller so per-step success rises, and adds validation checkpoints that allow retries and deterministic tools between calls.)
- What is the most common way a prompt chain breaks in production? (§3 — the handoff: a step emits vague or malformed output, so the next step receives bad input; fix with structured output (JSON/XML) validated by actually parsing into the target schema at each boundary.)
- When should you NOT use a chain? (§2, §5 — when a later step needs information an earlier step discarded (mis-decomposition), or when most inputs need different paths (use routing), or when steps are independent and could run in parallel.)
- How is a chain related to the typed prompt contracts from lesson 02? (§3 — a chain is typed contracts composed end to end; each step's validated output schema is the next step's input schema.)
- Where can non-model code live in a chain, and why does that matter? (§2, §4 — between any two model calls: validators, retrieval, arithmetic, linters; deterministic logic handles what code does better than a model and is the on-ramp to tool use, lesson 07.)
- A research agent fetches many sources but its draft is inconsistent — how do chaining and parallelism combine here? (§4b — gather sources in parallel (lesson 05), then chain the inherently sequential tail: merge → synthesize → review, each depending on the prior artifact.)
- How would you debug a chain whose final answer regressed? (§5 — persist and inspect intermediate artifacts; test stages individually to localize which step's output changed, rather than only scoring the final answer.)