all_lessons/agentic_systems/08 · planninglesson 9 / 25

Part III - Action and grounding

Planning - from goal to executable path

Tool use (lesson 07) gave the agent a way to act on the world one call at a time. But the hardest tasks are not single actions — they are goals whose path is unknown until you start walking it. Planning is the pattern that lets the agent synthesize a sequence of interdependent actions from an initial state toward a goal state, and — crucially — revise that sequence as observations arrive. This lesson builds the plan as a living control object, not a one-shot outline.

Book source
Chapter 6 - Planning (规划). The chapter's running examples: a team-building organizer, the CrewAI planner-writer agent, Google Gemini Deep Research (collaborative planning with human review of the plan), and the OpenAI Deep Research API. PDF outline pages 76-85.
The plan
Five moves. (1) Define planning precisely — synthesizing an ordered, interdependent action sequence from initial state to goal state — and separate it from the fixed workflows of lessons 03-05. (2) State the book's decision rule: plan when the path needs exploration; use a fixed flow when the path is known. (3) Make the plan a concrete data structure with fields (goal, assumptions, steps, tools, acceptance checks, replan triggers) and walk a worked deep-research trace with real numbers. (4) Build the plan / execute / observe / replan loop and earn why adaptivity is the whole point — the initial plan is a starting line, not a script. (5) Confront the budget tension with an interactive simulator: rigid plans waste budget on obsolete steps, but replanning on every wobble never converges. We close on the hand-off into multi-agent collaboration.
Linear position
Prerequisite: Lesson 07 — the agent can already call tools as controlled actions and read back observations. It also relies on routing (04), parallelization (05), and reflection (06) as the control-flow primitives a plan strings together.
New capability: A mutable plan state that decomposes a goal into checked steps and adapts its remaining path to what tool observations reveal.

1 · What planning is — and what it is not

The book opens with a sharp framing: treat a planning agent as an expert you delegate a complex goal to. When you say "organize a team-building event," you are specifying the what — the goal and its constraints (a budget, a headcount, a target date) — not the how. The agent's job is to autonomously find a path from the initial state (what is true now) to the goal state (the event is booked). That path is not handed to it; the plan is generated dynamically from the request.

Define the terms once, precisely:

The contrast that matters for engineering: planning is not the fixed workflows you already built. Prompt chaining (lesson 03) is a known linear path. Routing (lesson 04) is a known branch. Parallelization (lesson 05) is a known fan-out. In all three, you wired the topology in advance because you knew it. Planning is for the case where the topology itself is unknown until the agent starts exploring — where step 3 depends on what step 2 turned up.

The decision rule (memorize this)
The book is explicit about the trade-off between flexibility and predictability. Dynamic planning is a specific tool, not a universal solution. When the solution path is known and repeatable, constraining the agent to a fixed, predefined flow is more effective — it limits autonomy, reduces uncertainty and unpredictable behavior, and gives reliable, consistent results. So the question is never "should my agent be smart?" It is: does the how need exploration, or is it already known? If known, do not pay for planning.

Concretely: employee onboarding (create accounts, assign training modules, coordinate departments) has dependencies but a known structure — a fixed workflow with conditional branches beats a dynamic planner. A literature review where you do not know which papers are foundational until you read the first batch — that needs planning, because the second search depends on the first's results.

2 · The plan is a data structure, not a paragraph

The single most common beginner mistake is treating "the plan" as motivational prose the model emits and then ignores. A useful plan is a structured object the control loop reads, executes, and rewrites. The book's field list, made concrete:

FieldWhat it holdsWhy the loop needs it
goalThe goal-state condition, testable.The loop's exit test: done(goal, state).
assumptionsFacts the plan depends on ("venue X is available").An observation that violates one is the replan trigger.
stepsOrdered actions, each naming its dependencies.Tells the executor what is runnable now vs blocked.
toolsThe function/tool each step calls (lesson 07).Binds the abstract step to a real action.
acceptance checksThe observation that proves a step succeeded.Without it the plan is "motivational," not executable.
replan triggersPer-step conditions that demand revision.Turns the plan from a script into an adaptive object.

Notice the discipline: every step carries an acceptance check (how do we know it worked?) and a replan trigger (what observation would invalidate the rest?). A step without an acceptance check is a wish. A plan without replan triggers is a script that cannot adapt — and adaptivity, as the next section argues, is the entire reason you chose planning over a fixed flow.

3 · The loop: plan → execute → observe → replan

Planning sits after tool use in this track for a structural reason: a plan without executable actions is just an outline. The plan names steps; tools (lesson 07) make each step actually touch the world; observations feed the decision to continue or revise.

plan = planner(goal, initial_state)        # dynamically generated, not predefined
while not done(goal, state):
    step = choose_next_step(plan, state)    # next runnable step (deps satisfied)
    observation = execute(step.action)      # a tool call from lesson 07
    state.record(step, observation)
    if not step.acceptance_check(observation):
        # step failed its own success test
        plan = replan(goal, state, previous_plan=plan)
    elif observation.invalidates(plan.assumptions):
        # step succeeded but revealed the rest of the plan is wrong
        plan = replan(goal, state, previous_plan=plan)
    # else: assumptions hold, keep executing the existing plan

The book hammers one property above all: adaptivity. The initial plan is a starting point, not a rigid script. The agent's real power is adjusting direction on new information. In the team-building example: if the preferred venue is unavailable or the caterer is fully booked, a good agent does not simply fail — it re-evaluates options and forms a new plan (suggest an alternative venue, shift the date). This is the line between a planner and a brittle macro.

But the loop above is deliberately conservative about when to replan. It keeps executing the existing plan as long as assumptions hold; it only calls replan() when a step fails its acceptance check or an observation contradicts an assumption. That restraint is the antidote to over-reaction — and it sets up the budget tension in section 5.

4 · Worked trace — the deep-research agent

The book's flagship example is agentic deep research: Google Gemini Deep Research and the OpenAI Deep Research API. Both decompose a high-level query into a multi-point research plan, then run an iterative search-and-analysis loop — not a one-shot Q&A, but a controlled long-running process. Gemini Deep Research goes further with collaborative planning: it shows the decomposed plan to the user for review and editing before executing — a planning-time human-in-the-loop checkpoint (lesson 15) that catches a bad decomposition before it burns the budget.

Let us thread our running coding/research assistant through one concrete trace. Goal: "Summarize the economic impact of semaglutide on global health systems," with a budget of 10 search calls and an evidence threshold of 5 corroborated claims before synthesis.

GOAL: economic impact of semaglutide on health systems budget=10 searches, threshold=5 claims plan v1 (3 subquestions, dynamically generated): Q1 cost-per-patient Q2 system-level spend Q3 offset (fewer comorbidities) │ │ │ ┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐ │search ×2 │ obs: 2 claims ✓ │search ×2 │ obs: 1 claim │search ×2 │ obs: 0 claims ✗ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ │ ┌────────────────┴── REPLAN TRIGGER ─────────┘ │ │ Q3 returned nothing; a source named "budget-impact models" │ │ as the real driver — an assumption the plan never had. ▼ ▼ running tally: 3 claims after 6 searches (budget left = 4) plan v2 (revised remaining steps, NOT thrown away): keep Q1/Q2 evidence + Q4 budget-impact models (new subquestion from the observation) │ ┌────▼─────┐ │search ×3 │ obs: +3 claims → tally = 6 ≥ threshold 5 ✓ (budget left = 1) └────┬─────┘ ▼ SYNTHESIZE: structured, cited report (do NOT synthesize before threshold met)

Read the numbers. The agent did not blindly run all 10 searches — it tracked evidence against a threshold and stopped exploring once it crossed 5 corroborated claims, leaving 1 search in reserve. It did not treat plan v1 as sacred: when Q3 produced 0 claims and a source revealed a missing concept ("budget-impact models"), it replanned — but it kept the 3 claims already gathered rather than restarting. And it gated synthesis on the evidence threshold, not on "the plan said step 7 is synthesis." Every property the book emphasizes — dynamic generation, iterative search, gap discovery, adaptive replanning, threshold-gated synthesis — appears in those 7 searches.

The book's frameworks map cleanly onto this: CrewAI shows the simplest form — a planner_writer_agent whose task literally instructs "first produce a bullet-point plan, then write ~200 words from that plan," executed by a sequential Crew. That is planning as a single sequential task. Deep Research shows the advanced form — an asynchronous pipeline that can analyze hundreds of sources, survive single-point failures, expose every intermediate step (reasoning summaries, the actual web_search_call queries, code_interpreter runs) for debugging, and integrate private documents via MCP (lesson 12).

5 · The budget tension — try it

Here is the trade-off that decides whether a planner is useful or merely expensive. Replanning costs tokens, latency, and money: each replan() is another (often large) model call over the whole accumulated state. So two failure modes bracket the sweet spot. Replan too rarely (treat the first plan as sacred) and you keep executing obsolete steps long after the world changed — wasted actions on a dead path. Replan too eagerly (rewrite the plan on every minor observation) and you never converge — the agent thrashes, burning its budget on planning instead of acting. The book's restraint — "revise the remaining plan only when observations justify it" — is precisely this tuning.

The widget below simulates a research task of fixed difficulty. Each step either confirms the current plan or returns a surprise. A replan threshold controls how big a surprise must be before the agent rewrites its plan. Push it to 0 (replan on everything) and watch the budget vanish to planning overhead; push it to 1 (never replan) and watch wasted steps pile up on the stale path. Find the middle.

Replan threshold — when does adapting pay off?
A task of fixed difficulty over N steps. Each step yields a surprise of random size s ∈ [0,1]. If s ≥ threshold the agent replans (spending replan cost tokens) and gets back on a good path; if it does not replan when it should have (s was large but below threshold), that step is wasted. Total cost = step tokens + replan tokens + wasted-step penalty. Drag the threshold to minimize total cost.
Replans
0
Wasted steps
0
Replan tokens
0
Total cost (tok)
0
Show the core JS
// fixed step cost; each step has a "surprise" s drawn once per task roll
const STEP_TOK = 300, WASTE_PENALTY = 500;
let replans = 0, wasted = 0;
for (const s of surprises) {           // s in [0,1], one per step
  if (s >= threshold) replans++;       // big enough surprise -> replan, recover
  else if (s > 0.6) wasted++;          // should have replanned but didn't -> wasted step
}
const replanTok = replans * replanCost;
const total = N*STEP_TOK + replanTok + wasted*WASTE_PENALTY;
// threshold too low -> many replans, replanTok explodes
// threshold too high -> many wasted steps, penalty explodes
// the minimum sits in the middle

The U-shaped total-cost curve is the lesson: there is an interior optimum. A production planner exposes this threshold (often as "confidence to revise" or a step-budget) and tunes it per task class — cheap, well-understood tasks get a high threshold (rarely replan), genuinely exploratory tasks get a lower one.

6 · Failure modes and the checklist

Failure modes

  • First plan is sacred. Executing obsolete steps after an observation already killed the path. The team-building agent that "fails" because venue X was booked instead of proposing venue Y.
  • Motivational plans. Beautiful prose with no acceptance checks — nothing in the plan is testable, so the loop cannot tell success from failure.
  • Overplanning. The model spends its whole context budget composing an elegant 12-step plan that goes obsolete after the first tool observation. Plan short enough to be cheap to revise.
  • Replan thrash. Rewriting the plan on every minor surprise; the agent never converges and burns budget on planning instead of acting (section 5, threshold → 0).
  • No synthesis gate. Writing the final answer because "the plan said step N is synthesis," not because an evidence threshold was met (section 4).
  • Planning a known path. Paying for a dynamic planner where a fixed workflow (lessons 03-05) would be cheaper and more reliable.

Implementation checklist

  • Is the goal state a testable condition (the loop's exit test)?
  • Does the path actually need exploration, or is it known (use a fixed flow)?
  • Does every step carry an acceptance check and a replan trigger?
  • What observation invalidates an assumption and forces a replan?
  • What is the budget cap on planning vs acting (tokens, steps, $)?
  • Does replanning keep evidence already gathered, not restart?
  • Is synthesis gated on an evidence/quality threshold, not step count?
  • Should a human review the plan before execution (Deep Research style)?
  • Who approves high-risk steps before they run (lesson 15)?

Where this points next

We now have a single agent that can decompose a goal, execute steps as tool calls, observe results, and adapt its remaining path within a budget. The natural next pressure is specialization and scale: when one planner-executor loop is doing planning, searching, critiquing, and writing all at once, its context bloats and its judgments blur. The book's answer — and the next pattern — is to split the work across multiple agents with distinct roles, handoffs, and a synthesis step. Notice that the Deep Research pipeline already hints at this: planning, iterative search, and synthesis are separable responsibilities. Lesson 09, Multi-agent collaboration — roles, handoffs, synthesis, makes the split explicit and asks when specialization, isolation, or parallel judgment beats one general loop.

Takeaway
Planning is the ability to dynamically synthesize an ordered, interdependent action sequence from an initial state to a goal state — and to revise it as observations arrive. Use it only when the how needs exploration; when the path is known, a fixed workflow (lessons 03-05) is cheaper and more reliable. Make the plan a data structure (goal, assumptions, steps, tools, acceptance checks, replan triggers), not a paragraph. Run the plan → execute → observe → replan loop, and replan only when an observation invalidates an assumption — over-reacting thrashes the budget, never reacting executes a dead path. The book's exemplars: CrewAI's sequential planner-writer (planning as one task) and Google/OpenAI Deep Research (planning as an iterative, gap-discovering, threshold-gated, human-reviewable long process).

Interview prompts