Part II - Core execution patterns

Prompt chaining - linear decomposition

Lesson 02 made the prompt a typed contract: a single model call with a scoped input shape and a validated output shape. But many real tasks are too big for one call to satisfy reliably. This lesson takes the first step into control flow — it strings several of those typed calls into an ordered pipeline, where each artifact narrows the job of the next. This is the simplest possible agent topology: a straight line.

Book source

Chapter 1 - Prompt Chaining (提示链), also called the Pipeline pattern; PDF outline pages 24-32. Includes the market-research running example, structured-output (JSON) handoffs, the LangChain/LCEL two-step demo, and the context-engineering discussion.

The plan

Five moves. (1) Show why a single overloaded prompt fails on a multi-part task — and make the reliability argument numeric, because the whole pattern rests on it. (2) Define the chain as an ordered sequence of typed transformations where each step does one job. (3) Make the handoff concrete: structured output (JSON) between steps, not loose prose, because that is where chains actually break. (4) Walk the book's running examples — the market-research chain and a research/coding assistant — stage by stage, with real artifacts. (5) Add the engineering scaffolding the book implies: validation, retries from stored inputs, and the rule that intermediate artifacts are first-class. We close on the limit of a straight line, which is exactly what lesson 04 (Routing) removes.

Linear position

Prerequisite: Typed prompt and context contracts (lesson 02) — you must already think of one model call as a function with an input schema and a validated output schema.
New capability: Composing several of those calls into a fixed, linear sequence, so that a task too large for one prompt becomes a series of small, individually inspectable transformations.

1 · Why one prompt is not enough — the reliability argument

Start with the failure the book leads with. Suppose you ask a model, in a single prompt: "Analyze this market-research report, summarize the findings, extract the three most important data points, and draft an email to the marketing team about them." That is four distinct jobs welded together. A capable model will often do most of it — and quietly drop one part. It summarizes well, extracts two of three data points, and writes an email that omits the numbers. Nothing errored. The output looks fluent. It is just incomplete.

The book names the mechanisms behind this: instruction neglect (the model skips part of a multi-part request), context drift (it loses track of an earlier constraint by the time it reaches a later one), error accumulation (a small mistake early gets built upon), context-window pressure (the combined input/output is too long to hold cleanly), and hallucination under cognitive load. The unifying intuition: a single call has to satisfy all constraints simultaneously, and the more constraints you pile onto one inference, the lower the chance every one of them is met.

Here is the argument made numeric, because it is the load-bearing claim of the entire pattern. Model a complex task as n independent sub-requirements, each of which a single overloaded call gets right with probability p. The chance the one call nails all of them is the product:

P_single = pⁿ .

Worked number. Take n = 4 sub-tasks (summarize, extract point A, extract point B, draft email) and a generous per-requirement success p = 0.85 when they all share one crowded prompt. Then P_single = 0.85⁴ ≈ 0.52. Roughly a coin flip that the whole thing is correct — and notice the model still sounds confident on every failed run. Now decompose into a chain of focused steps. Each step does one thing, so per-step reliability rises (call it p' = 0.95), and — critically — you can validate the artifact between steps and re-run a failed step, raising its effective success higher still. Even without retries:

P_chain = (p')ⁿ = 0.95⁴ ≈ 0.81 .

Going from 0.52 to 0.81 is the whole game, and the widget below lets you feel exactly how the two curves diverge as the task grows. The deeper point: a chain does not just raise p; it gives you checkpoints. Between two model calls you can run plain code — validate a schema, retry, branch, or call a deterministic tool — which a single monolithic call can never expose.

Reliability: one overloaded prompt vs. a validated chain

A task has n sub-requirements. A single crowded prompt must satisfy all of them at once, succeeding with probability pⁿ. A chain handles them one at a time at a higher per-step rate p', and (optionally) re-runs any step that fails its validation, up to a retry budget. Raise n and watch the single-prompt curve collapse while the chain holds.

sub-tasks n: 4 single-prompt p: 0.85 chain step p': 0.95 retries/step: 1

P(single ok)

0.52

P(chain ok)

0.81

Eff. step rate

0.95

Chain advantage

1.6×

Show the core JS

// single overloaded prompt: must get all n sub-tasks right at once
const pSingle = Math.pow(p, n);

// chain: each step retried up to r times against its validator,
// so effective per-step success = 1 - (chance it fails every attempt)
const pStepEff = 1 - Math.pow(1 - pPrime, r + 1);
const pChain   = Math.pow(pStepEff, n);

// the chain wins because (a) focused steps raise p -> p', and
// (b) validation between steps lets you retry from stored inputs.

Read the curves honestly

The chain is not free reliability — it costs latency (more sequential calls) and more total tokens. What it buys is controllability: each step is simpler, locally testable, individually optimizable, and re-runnable. Push p' down toward p in the widget and the advantage shrinks — the gain comes precisely from steps being easier because they are smaller, plus the retry that validation enables.

2 · The pattern: an ordered sequence of typed transformations

Formally, a prompt chain is a pipeline. The book frames it as a divide-and-conquer strategy: instead of solving a hard problem in one shot, you decompose it into a sequence of smaller sub-problems, each handled by a purpose-built prompt, with the output of step k becoming the input of step k+1. That dependency chain is the defining feature — earlier context and results steer later processing, so the model refines its understanding step by step rather than guessing everything at once.

Think of it as a Unix pipeline or a sequence of pure functions: each stage takes a well-defined input, performs one transformation, and emits a well-defined output for the next stage. The mental model the book offers is the computational pipeline — a function completes its specific operation and hands the result downstream. Crucially, between two model calls you can interleave ordinary code: a validator, a database lookup, an arithmetic tool, a conditional. The chain is not "a list of prompts"; it is a list of transformations, some of which happen to be model calls.

user task │ ▼ ┌─────────┐ artifact₁ ┌─────────┐ artifact₂ ┌─────────┐ artifact₃ │ step 1 │──validate────▶│ step 2 │──validate────▶│ step 3 │──▶ final │ classify│ (JSON) │ extract │ (JSON) │ draft │ └─────────┘ └─────────┘ └─────────┘ │ ▲ │ ▲ │ ▲ │ └── retry from │ └── deterministic │ └── inspect / │ stored input │ tool here │ cache artifact each box = one focused model call OR a plain function; each arrow = a typed handoff

The book also points out that a step is not obliged to be a model call at all. A chain step can instruct the model to call an external API, query a database, or run a tool — which is how chaining becomes the on-ramp to genuine agency. We will build proper tool use in lesson 07; for now the relevant fact is that the linear structure already lets deterministic code sit between inferences, doing the parts code does better than a model (parsing, validation, arithmetic).

3 · The handoff is the hard part — use structured output

Here is where chains actually break in practice, and the book is emphatic about it: the reliability of a chain depends entirely on the integrity of the data passed between steps. If step k emits something vague or loosely formatted, step k+1 receives a malformed input and fails — and because the failure is downstream, it is annoying to localize. The fix is to specify a structured output format (JSON or XML) at each handoff, so the artifact can be parsed and validated by machine rather than re-interpreted as free text.

Take the book's market-research chain. Step 2 (trend identification) should not emit a paragraph; it should emit a parseable object:

{
  "trends": [
    {
      "trend_name": "AI-driven personalization",
      "supporting_data": "73% of consumers prefer brands that use personal info to improve the shopping experience."
    },
    {
      "trend_name": "Sustainability & word-of-mouth",
      "supporting_data": "ESG-labeled products grew 28% over five years vs. 20% for unlabeled."
    }
  ]
}

Now step 3 (draft the email) receives a typed list it can iterate over, not prose it has to re-read. Structured handoffs do three things at once: they make the artifact machine-parseable, they let you run a schema check between steps (the validation checkpoint from §1), and they shrink the surface area of natural-language ambiguity that causes drift. This is the through-line back to lesson 02 — the chain is just typed contracts composed end to end, where each step's output schema is the next step's input schema.

The silent-truncation foot-gun

A step that returns prose "mostly in JSON" — fenced in markdown, prefixed with "Sure, here's the JSON:", or truncated by a token limit — will pass a naive response.contains("{") check and then fail your real parser three steps later. Validate by actually parsing into the target schema at the boundary, and on failure, re-prompt the same step with the parse error as added context (the book's iterative "extract → check → re-prompt with failure context" loop), not the whole chain.

4 · The running examples, stage by stage

The book gives several canonical chains. Two carry the rest of this track, so we make them concrete.

4a · Market-research chain (the book's headline example)

Step	Role / prompt focus	Input	Output artifact
1 · Summarize	"Summarize the key findings of this report"	raw report text	`summary` (prose, bounded length)
2 · Identify trends	role: market analyst; "extract 3 trends + supporting data"	`summary`	`{trends:[...]}` (JSON)
3 · Draft email	role: documentation specialist; "write a concise email for the marketing team"	`{trends}`	`email` (final text)

Note the book's touch of assigning each step a distinct role ("market analyst", "trade analyst", "documentation specialist"). The role is part of the per-step contract: it narrows the model's behavior to the one job that step owns, which is part of how p rises to p'.

4b · Research/coding assistant chain (our track's running example)

This is the example threaded through the whole track. A research-and-code assistant answering "How did the 1929 crash happen and what policy response followed?" — or "implement and test this function" — decomposes naturally into a line:

Step	Transformation	Artifact (typed)	Could be code/tool?
1 · Decompose	turn the question into core sub-questions / requirements	`{subquestions:[...]}`	model
2 · Retrieve	fetch sources / read the relevant files	`{sources:[{id,text}]}`	tool (retrieval)
3 · Summarize each	condense each source to claims + citation	`{claims:[{text,src_id}]}`	model
4 · Synthesize	merge claims into a coherent draft	`draft`	model
5 · Verify citations	check every claim maps to a real source id	`{ok:bool, bad:[...]}`	code (deterministic)
6 · Format final	render the answer with citations	`answer`	model

The book makes a sharp observation about a research agent specifically: data gathering is often done in parallel (fetch many articles at once), but the later stages — merge, synthesize, review — are inherently sequential because each depends on the previous result. So a real research agent is a parallel fan-out feeding a chained tail. That is a clean foreshadow: parallelization is lesson 05, and it composes with chaining rather than replacing it.

The coding variant follows the same spine the book lists: (1) understand the requirement and emit pseudocode/outline, (2) write the first draft, (3) identify bugs or improvements (a static analyzer or a second model call), (4) rewrite/optimize, (5) add docs and tests. The win is identical — each model call faces a smaller, locally checkable job, and deterministic logic (a linter, a test runner) can sit between calls.

4c · The framework reality — LangChain LCEL

The book's runnable example is a two-step LangChain chain built with LCEL (the LangChain Expression Language), which uses the pipe operator to compose a prompt, a model, and an output parser. Step one extracts technical specs from free text; step two converts them to JSON. In spirit:

# step 1: prompt | model | parser  ->  spec text
extraction_chain = prompt_extract | llm | StrOutputParser()

# step 2: feed step-1 output as the 'specifications' variable into prompt 2
full_chain = (
    {"specifications": extraction_chain}
    | prompt_transform        # "convert to JSON with keys cpu, memory, storage"
    | llm
    | StrOutputParser()
)

full_chain.invoke({"text_input": "New laptop: 3.5GHz 8-core CPU, 16GB RAM, 1TB NVMe SSD."})

The book is careful to separate the principle from the syntax: LangChain gives a linear-sequence abstraction; LangGraph adds stateful, cyclic computation for more complex agent behavior; CrewAI and Google's Agent Development Kit (ADK) provide structured environments for multi-step flows and roles. But chaining itself is framework-independent — at its barest it is sequential function calls in a script. Reach for a framework when you need managed state, retries, observability, and composition; reach for a plain loop when you do not.

5 · Engineering scaffolding: artifacts, validation, retries

The teaching idea the book keeps returning to: intermediate outputs are first-class artifacts. They can be inspected, validated, cached, retried, or swapped for a deterministic tool. That single shift — from "the answer" to "a sequence of stored, typed artifacts" — is what makes a chain debuggable and is the reason chaining, not single prompting, is where real agent engineering begins.

chain = [
    decompose,            # -> {subquestions}
    retrieve,             # -> {sources}        (tool step)
    summarize_sources,    # -> {claims}
    synthesize,           # -> draft
    verify_citations,     # -> {ok, bad}        (deterministic step)
    format_final,         # -> answer
]

state = State(task=user_task)
for step in chain:
    artifact = step.run(state)                 # inputs read from state
    ok, err = validate(artifact, step.schema)  # parse into the target schema
    if not ok:
        artifact = retry(step, state, err, budget=2)  # re-prompt with failure context
    state.put(step.name, artifact)             # persist; never overwrite, append
# every state.put is an inspectable, re-runnable checkpoint

Three properties fall out of this structure, and your checklist should verify all three: (1) every step is named by its transformation (classify, extract, synthesize) — if you cannot name it, it is doing more than one job; (2) every step reduces or reorganizes information toward the goal; (3) a failed step can be re-run from its stored inputs, because the previous artifacts were persisted. If a step instead requires information an earlier step already discarded, the chain is mis-decomposed — that is the canonical "when to avoid" signal.

Failure modes

Prose handoffs. Passing free text between steps instead of a typed artifact; ambiguity compounds and step k+1 silently misreads step k.
Hidden state. A chain of vague prompts where each step secretly depends on context that was never written down, so it cannot be re-run or tested in isolation.
Over-chaining. Forcing every request through every step when most need only two — the rigid line wastes latency and tokens (routing, lesson 04, fixes this).
End-only testing. Evaluating only the final answer, so you cannot tell which stage regressed when quality drops.
Lost-information cuts. A late step needing detail an early step threw away — a sign the decomposition is wrong, not that you need a longer prompt.

Implementation checklist

Can each step be named by a single transformation verb?
Does each step have an explicit output schema, validated by parsing?
Are handoffs structured (JSON/XML), not prose?
Can a failed step re-run from stored inputs with the error as context?
Is each intermediate artifact persisted and inspectable for debugging?
Could any model step be replaced by deterministic code or a tool?
Do you test stages individually, not just the final output?

Where this points next

A chain is a straight line: step 1 always leads to step 2 leads to step 3, for every input. That rigidity is its strength — it is the most predictable, most inspectable topology you can build — and also its ceiling. Real tasks are not all the same shape: a billing question and a code-review request should not traverse the same six stages. The moment the next step depends on what the input is, a fixed line cannot express it, and you need a decision in the control flow. That decision is the next pattern. Lesson 04, Routing — conditional control flow, adds exactly one primitive on top of the chain: a step whose output is not data but a choice of which step runs next. Chaining gives you the line; routing gives you the branch.

Takeaway

Prompt chaining (the pipeline pattern) turns one overloaded request into an ordered sequence of focused transformations, where each step's typed output feeds the next. It wins because a single call must satisfy all constraints at once — succeeding with probability pⁿ, which collapses as the task grows — while a chain makes each step smaller (raising p to p') and inserts validation checkpoints between calls that enable retries and deterministic tools. The reliability of a chain lives in its handoffs, so pass structured output (JSON), validate by parsing into the target schema, and treat every intermediate artifact as a first-class, persisted, re-runnable object. Frameworks (LangChain/LCEL, LangGraph, CrewAI, Google ADK) manage the plumbing, but the principle is framework-independent. A chain is a straight line; when the next step must depend on the input, you need routing (lesson 04).

Interview prompts

Why does splitting one prompt into a chain improve reliability? (§1 — a single call must satisfy all n sub-requirements at once, succeeding with probability pⁿ; a chain makes each step smaller so per-step success rises, and adds validation checkpoints that allow retries and deterministic tools between calls.)
What is the most common way a prompt chain breaks in production? (§3 — the handoff: a step emits vague or malformed output, so the next step receives bad input; fix with structured output (JSON/XML) validated by actually parsing into the target schema at each boundary.)
When should you NOT use a chain? (§2, §5 — when a later step needs information an earlier step discarded (mis-decomposition), or when most inputs need different paths (use routing), or when steps are independent and could run in parallel.)
How is a chain related to the typed prompt contracts from lesson 02? (§3 — a chain is typed contracts composed end to end; each step's validated output schema is the next step's input schema.)
Where can non-model code live in a chain, and why does that matter? (§2, §4 — between any two model calls: validators, retrieval, arithmetic, linters; deterministic logic handles what code does better than a model and is the on-ramp to tool use, lesson 07.)
A research agent fetches many sources but its draft is inconsistent — how do chaining and parallelism combine here? (§4b — gather sources in parallel (lesson 05), then chain the inherently sequential tail: merge → synthesize → review, each depending on the prior artifact.)
How would you debug a chain whose final answer regressed? (§5 — persist and inspect intermediate artifacts; test stages individually to localize which step's output changed, rather than only scoring the final answer.)