Part III - Action and grounding
Planning - from goal to executable path
Tool use (lesson 07) gave the agent a way to act on the world one call at a time. But the hardest tasks are not single actions — they are goals whose path is unknown until you start walking it. Planning is the pattern that lets the agent synthesize a sequence of interdependent actions from an initial state toward a goal state, and — crucially — revise that sequence as observations arrive. This lesson builds the plan as a living control object, not a one-shot outline.
New capability: A mutable plan state that decomposes a goal into checked steps and adapts its remaining path to what tool observations reveal.
1 · What planning is — and what it is not
The book opens with a sharp framing: treat a planning agent as an expert you delegate a complex goal to. When you say "organize a team-building event," you are specifying the what — the goal and its constraints (a budget, a headcount, a target date) — not the how. The agent's job is to autonomously find a path from the initial state (what is true now) to the goal state (the event is booked). That path is not handed to it; the plan is generated dynamically from the request.
Define the terms once, precisely:
- Initial state — the facts the agent starts with (budget = $2,000, 12 people, prefer a Friday in March).
- Goal state — the testable condition that means success (a venue and caterer are booked, confirmations in hand).
- Plan — an ordered set of steps, each with dependencies, that is hypothesized to move the state from initial toward goal.
- Replanning — revising the remaining steps when an observation contradicts an assumption the plan was built on.
The contrast that matters for engineering: planning is not the fixed workflows you already built. Prompt chaining (lesson 03) is a known linear path. Routing (lesson 04) is a known branch. Parallelization (lesson 05) is a known fan-out. In all three, you wired the topology in advance because you knew it. Planning is for the case where the topology itself is unknown until the agent starts exploring — where step 3 depends on what step 2 turned up.
Concretely: employee onboarding (create accounts, assign training modules, coordinate departments) has dependencies but a known structure — a fixed workflow with conditional branches beats a dynamic planner. A literature review where you do not know which papers are foundational until you read the first batch — that needs planning, because the second search depends on the first's results.
2 · The plan is a data structure, not a paragraph
The single most common beginner mistake is treating "the plan" as motivational prose the model emits and then ignores. A useful plan is a structured object the control loop reads, executes, and rewrites. The book's field list, made concrete:
| Field | What it holds | Why the loop needs it |
|---|---|---|
| goal | The goal-state condition, testable. | The loop's exit test: done(goal, state). |
| assumptions | Facts the plan depends on ("venue X is available"). | An observation that violates one is the replan trigger. |
| steps | Ordered actions, each naming its dependencies. | Tells the executor what is runnable now vs blocked. |
| tools | The function/tool each step calls (lesson 07). | Binds the abstract step to a real action. |
| acceptance checks | The observation that proves a step succeeded. | Without it the plan is "motivational," not executable. |
| replan triggers | Per-step conditions that demand revision. | Turns the plan from a script into an adaptive object. |
Notice the discipline: every step carries an acceptance check (how do we know it worked?) and a replan trigger (what observation would invalidate the rest?). A step without an acceptance check is a wish. A plan without replan triggers is a script that cannot adapt — and adaptivity, as the next section argues, is the entire reason you chose planning over a fixed flow.
3 · The loop: plan → execute → observe → replan
Planning sits after tool use in this track for a structural reason: a plan without executable actions is just an outline. The plan names steps; tools (lesson 07) make each step actually touch the world; observations feed the decision to continue or revise.
plan = planner(goal, initial_state) # dynamically generated, not predefined
while not done(goal, state):
step = choose_next_step(plan, state) # next runnable step (deps satisfied)
observation = execute(step.action) # a tool call from lesson 07
state.record(step, observation)
if not step.acceptance_check(observation):
# step failed its own success test
plan = replan(goal, state, previous_plan=plan)
elif observation.invalidates(plan.assumptions):
# step succeeded but revealed the rest of the plan is wrong
plan = replan(goal, state, previous_plan=plan)
# else: assumptions hold, keep executing the existing plan
The book hammers one property above all: adaptivity. The initial plan is a starting point, not a rigid script. The agent's real power is adjusting direction on new information. In the team-building example: if the preferred venue is unavailable or the caterer is fully booked, a good agent does not simply fail — it re-evaluates options and forms a new plan (suggest an alternative venue, shift the date). This is the line between a planner and a brittle macro.
But the loop above is deliberately conservative about when to replan. It keeps executing the existing plan as long as assumptions hold; it only calls replan() when a step fails its acceptance check or an observation contradicts an assumption. That restraint is the antidote to over-reaction — and it sets up the budget tension in section 5.
4 · Worked trace — the deep-research agent
The book's flagship example is agentic deep research: Google Gemini Deep Research and the OpenAI Deep Research API. Both decompose a high-level query into a multi-point research plan, then run an iterative search-and-analysis loop — not a one-shot Q&A, but a controlled long-running process. Gemini Deep Research goes further with collaborative planning: it shows the decomposed plan to the user for review and editing before executing — a planning-time human-in-the-loop checkpoint (lesson 15) that catches a bad decomposition before it burns the budget.
Let us thread our running coding/research assistant through one concrete trace. Goal: "Summarize the economic impact of semaglutide on global health systems," with a budget of 10 search calls and an evidence threshold of 5 corroborated claims before synthesis.
Read the numbers. The agent did not blindly run all 10 searches — it tracked evidence against a threshold and stopped exploring once it crossed 5 corroborated claims, leaving 1 search in reserve. It did not treat plan v1 as sacred: when Q3 produced 0 claims and a source revealed a missing concept ("budget-impact models"), it replanned — but it kept the 3 claims already gathered rather than restarting. And it gated synthesis on the evidence threshold, not on "the plan said step 7 is synthesis." Every property the book emphasizes — dynamic generation, iterative search, gap discovery, adaptive replanning, threshold-gated synthesis — appears in those 7 searches.
The book's frameworks map cleanly onto this: CrewAI shows the simplest form — a planner_writer_agent whose task literally instructs "first produce a bullet-point plan, then write ~200 words from that plan," executed by a sequential Crew. That is planning as a single sequential task. Deep Research shows the advanced form — an asynchronous pipeline that can analyze hundreds of sources, survive single-point failures, expose every intermediate step (reasoning summaries, the actual web_search_call queries, code_interpreter runs) for debugging, and integrate private documents via MCP (lesson 12).
5 · The budget tension — try it
Here is the trade-off that decides whether a planner is useful or merely expensive. Replanning costs tokens, latency, and money: each replan() is another (often large) model call over the whole accumulated state. So two failure modes bracket the sweet spot. Replan too rarely (treat the first plan as sacred) and you keep executing obsolete steps long after the world changed — wasted actions on a dead path. Replan too eagerly (rewrite the plan on every minor observation) and you never converge — the agent thrashes, burning its budget on planning instead of acting. The book's restraint — "revise the remaining plan only when observations justify it" — is precisely this tuning.
The widget below simulates a research task of fixed difficulty. Each step either confirms the current plan or returns a surprise. A replan threshold controls how big a surprise must be before the agent rewrites its plan. Push it to 0 (replan on everything) and watch the budget vanish to planning overhead; push it to 1 (never replan) and watch wasted steps pile up on the stale path. Find the middle.
The U-shaped total-cost curve is the lesson: there is an interior optimum. A production planner exposes this threshold (often as "confidence to revise" or a step-budget) and tunes it per task class — cheap, well-understood tasks get a high threshold (rarely replan), genuinely exploratory tasks get a lower one.
6 · Failure modes and the checklist
Failure modes
- First plan is sacred. Executing obsolete steps after an observation already killed the path. The team-building agent that "fails" because venue X was booked instead of proposing venue Y.
- Motivational plans. Beautiful prose with no acceptance checks — nothing in the plan is testable, so the loop cannot tell success from failure.
- Overplanning. The model spends its whole context budget composing an elegant 12-step plan that goes obsolete after the first tool observation. Plan short enough to be cheap to revise.
- Replan thrash. Rewriting the plan on every minor surprise; the agent never converges and burns budget on planning instead of acting (section 5, threshold → 0).
- No synthesis gate. Writing the final answer because "the plan said step N is synthesis," not because an evidence threshold was met (section 4).
- Planning a known path. Paying for a dynamic planner where a fixed workflow (lessons 03-05) would be cheaper and more reliable.
Implementation checklist
- Is the goal state a testable condition (the loop's exit test)?
- Does the path actually need exploration, or is it known (use a fixed flow)?
- Does every step carry an acceptance check and a replan trigger?
- What observation invalidates an assumption and forces a replan?
- What is the budget cap on planning vs acting (tokens, steps, $)?
- Does replanning keep evidence already gathered, not restart?
- Is synthesis gated on an evidence/quality threshold, not step count?
- Should a human review the plan before execution (Deep Research style)?
- Who approves high-risk steps before they run (lesson 15)?
Where this points next
We now have a single agent that can decompose a goal, execute steps as tool calls, observe results, and adapt its remaining path within a budget. The natural next pressure is specialization and scale: when one planner-executor loop is doing planning, searching, critiquing, and writing all at once, its context bloats and its judgments blur. The book's answer — and the next pattern — is to split the work across multiple agents with distinct roles, handoffs, and a synthesis step. Notice that the Deep Research pipeline already hints at this: planning, iterative search, and synthesis are separable responsibilities. Lesson 09, Multi-agent collaboration — roles, handoffs, synthesis, makes the split explicit and asks when specialization, isolation, or parallel judgment beats one general loop.
Interview prompts
- When should an agent use dynamic planning versus a fixed workflow? (§1 — plan when the solution path needs exploration; use a fixed, predefined flow when the path is known and repeatable, because it is more predictable, reliable, and cheaper. The deciding question is whether the "how" is known.)
- What turns a plan from "motivational prose" into an executable control object? (§2 — making it a data structure where every step has an acceptance check (proof it succeeded) and a replan trigger (an observation that invalidates the rest); the loop reads and rewrites it.)
- Why does planning come after tool use in the dependency ladder? (§3 — a plan without executable actions is just an outline; tools make each step actually touch the world and produce the observations that drive continue-or-replan.)
- Describe the cost trade-off in deciding when to replan. (§5 — replanning costs a full model call over accumulated state; replan too rarely and you waste steps on a stale path, too eagerly and you thrash and never converge. Total cost is U-shaped in the replan threshold, with an interior optimum tuned per task class.)
- How does the deep-research example demonstrate adaptivity without restarting? (§4 — when a subquestion returns no evidence and a source reveals a missing concept, the agent adds a new subquestion and keeps the evidence already gathered, then gates synthesis on an evidence threshold rather than step count.)
- What is collaborative planning and why is it valuable? (§4 — Google Deep Research shows the decomposed multi-point plan to the user for review/editing before execution; a planning-time human checkpoint catches a bad decomposition before it burns the search budget.)
- Name three planning failure modes and their fixes. (§6 — first-plan-is-sacred → add replan triggers; motivational plans → require acceptance checks; overplanning/thrash → cap planning budget and keep plans short enough to revise.)