Part II - Core execution patterns

Routing - conditional control flow

Lesson 03 turned one overloaded request into a fixed line of transformations. But a real assistant does not get one kind of request — it gets many, and the right sequence of steps depends on which one arrived. Routing is the first pattern that lets the agent decide the path instead of hard-coding it: read the input and the state, classify, then send control down a specialized branch. It is the smallest amount of decision-making that turns a pipeline into an adaptive system.

Book source

Chapter 2 - Routing (路由); PDF outline pages 33-42. The chapter's worked code is a LangChain RunnableBranch "coordinator" plus a Google ADK Auto-Flow version, both delegating to booker / info / unclear handlers.

Linear position

Prerequisite: Lesson 03 (Prompt chaining) — a linear chain whose steps and intermediate artifacts are explicit and typed.
New capability: A branch point that chooses the next path based on input, accumulated state, or a prior step's observation — conditional control flow, with a visible decision you can audit.

The plan

Five moves. (1) Frame routing as the move from a fixed control flow to a conditional one, and name the contract a router must return — not just a destination. (2) Walk the four routing mechanisms the book lists — LLM, embedding, rule, and trained-classifier — and when each wins, with latency and cost numbers. (3) Build the book's coordinator example concretely (the booker/info/unclear delegation) and re-skin it as our running coding/research assistant. (4) Make the abstain threshold quantitative: a confidence cutoff is a knob that trades misroutes against unnecessary clarifications, and there is a worked number for where to set it. (5) Failure modes, a checklist, and the hand-off into parallelization, where the router stops choosing one branch and starts firing several.

1 · From a fixed path to a decided path

A prompt chain (lesson 03) is deterministic: step 1 always feeds step 2 always feeds step 3. That is exactly right when every request needs the same treatment — summarize, then translate, then format. But most agent inputs are not uniform. A customer-support agent receives order-status questions, product questions, technical-support questions, and gibberish, all through the same text box. Forcing every one of them down a single chain means one of two bad outcomes: the chain is general enough to handle all of them and therefore good at none, or it is tuned for one and silently mangles the rest.

Routing introduces conditional logic into the operating loop. The system evaluates the current situation against a set of criteria and selects which of several specialized functions, tools, or sub-flows should run next. The book's canonical illustration: a support agent first classifies the user's intent, then dispatches —

→order status → a sub-agent/tool chain that queries the order database

→product info → a chain that retrieves from the product catalog (RAG, lesson 16)

→technical support → a troubleshooting chain, with escalation to a human

→unclear → a clarification sub-flow that asks one focused question

The mental model: routing is the switch statement of agent design. A plain program reads a value and jumps to the matching case; an agent reads a fuzzy natural-language situation, infers the case, and jumps. Everything hard about routing lives in that word "infers" — the decision is now probabilistic, so the router must be honest about how sure it is.

The router's real contract

A naive router returns a single label: "product_info". That is not enough to operate or debug. A production router returns four things: the destination, a confidence in [0,1], a short rationale, and the minimal context the chosen branch needs. Without confidence you cannot abstain; without rationale you cannot tell why a legal question landed in the general-chat branch when you read the logs three weeks later.

2 · Four ways to decide — and what each costs

The book lists four mechanisms for the decision component. They are not ranked; they trade accuracy, latency, cost, and flexibility differently, and mature systems layer them.

Rule-based

if/else, switch, regex, keyword match. Fastest and fully deterministic (~0 ms, $0, no model call). Brittle on novel or paraphrased input. Use for hard constraints and obvious cases.

LLM-based

Prompt a model to emit one label: "output only one of: order_status, product_info, technical_support, other". Most flexible, handles paraphrase and nuance. Costs a full inference (~200-800 ms, real $/call). The chapter's coordinator uses this.

Embedding-based

Embed the query, compare (cosine) to vectors representing each route, pick the nearest. Semantic — routes on meaning, not keywords. Cheaper than an LLM call, great for many routes. See RAG, lesson 16.

Trained classifier

A small discriminative model fine-tuned on labeled routing data; logic lives in weights, not a prompt. Fast and cheap at inference like embeddings, but supervised. An LLM may generate synthetic training data — it does not make the live decision.

The crucial distinction the book draws is between the LLM-based and the trained-classifier approaches: both can be accurate, but only the LLM runs a generative model at decision time. The classifier has already baked its logic into weights, so it is cheap and deterministic per call — at the price of needing a labeled dataset and re-training when the route set changes.

Worked cost number — layer the cheap mechanism first

Suppose 60% of incoming requests are unambiguous ("where is my order #48213") and 40% are nuanced. An LLM router at, say, $0.15 per call across 1,000,000 requests/month costs 1{,}000{,}000 × $0.15 = $150{,}000. Put a rule-based pre-filter in front that catches the 60% obvious ones for free, and only the remaining 400,000 hit the model: 400{,}000 × $0.15 = $60{,}000. Same routing quality, $90k/month saved, and the deterministic cases also got faster (0 ms vs ~400 ms). This is the standard hybrid: rules for the certain, model for the ambiguous. We will formalize this kind of budget thinking in lesson 18.

Routing is not only a front-door classifier. The book stresses it can fire at any stage of the loop: as initial task classification, as a mid-chain decision about what to do next given accumulated state, or as tool selection inside a sub-flow. A research system might use one router to assign work among retrieval, summarization, and analysis agents; a coding assistant first identifies the language and the user's intent (debug, explain, translate) before handing the snippet to the matching tool.

3 · The book's coordinator, made concrete

The chapter's running code builds a "coordinator" that routes a user request to one of three handlers. In LangChain it is an LLM classifier piped into a RunnableBranch; in Google ADK it is a parent Agent with sub_agents that the framework's Auto-Flow delegates to automatically. Two framework styles, one idea: classify, then dispatch to a specialist, and keep the decision visible.

LangChain style (explicit branch). A prompt forces a single-word label, then a branch maps the label to a handler:

# 1) the decision component: LLM emits exactly one label
coordinator_router_prompt = ChatPromptTemplate.from_messages([
  ("system",
   "Analyze the request and decide which handler should take it.\n"
   "- flights or hotels  -> output 'booker'\n"
   "- general questions  -> output 'info'\n"
   "- otherwise/unclear  -> output 'unclear'\n"
   "Output ONLY one word: 'booker', 'info', or 'unclear'."),
  ("user", "{request}"),
])
router_chain = coordinator_router_prompt | llm | StrOutputParser()

# 2) the dispatch: map the label to a specialist branch
delegation = RunnableBranch(
  (lambda x: x["decision"].strip() == "booker", booking_branch),
  (lambda x: x["decision"].strip() == "info",   info_branch),
  unclear_branch,            # default / fallback branch
)

# 3) compose: route first, then run the chosen branch
coordinator = ({"decision": router_chain, "request": RunnablePassthrough()}
               | delegation | (lambda x: x["output"]))

Note three things the book's code embodies. The router emits a closed vocabulary of labels, not free text. The branch has an explicit default (unclear) — there is always a fallback. And the destination handlers (booking_handler, info_handler, unclear_handler) are isolated functions, so each branch can be tested and changed independently.

Google ADK style (capability/tool routing). Instead of an explicit graph, ADK gives the coordinator sub_agents and lets the framework's Auto-Flow match the request to the right one based on each sub-agent's description:

booking_agent = Agent(name="Booker", model="gemini-2.0-flash",
    description="Handles flight and hotel bookings via the booking tool.",
    tools=[booking_tool])
info_agent = Agent(name="Info", model="gemini-2.0-flash",
    description="Answers general information questions via the info tool.",
    tools=[info_tool])

coordinator = Agent(name="Coordinator", model="gemini-2.0-flash",
    instruction=("You ONLY analyze the request and delegate; never answer directly.\n"
                 "- booking of flights/hotels -> delegate to 'Booker'\n"
                 "- general information       -> delegate to 'Info'"),
    sub_agents=[booking_agent, info_agent])
# runner.run(...) — Auto-Flow routes to a sub_agent based on its description

The book's contrast is exactly the one in the index's "framework choices": LangGraph's state-graph architecture suits complex routing where the decision depends on accumulated system state (you draw nodes and the functions/model-evaluations that govern transitions between them), while Google ADK routes implicitly via tool/capability descriptions, which suits agents whose actions are clearly named. Both are listed by the book alongside LangChain as the frameworks that give routing explicit structure.

Re-skinned as the running coding/research assistant

Our track's running example is a coding and research assistant. The same coordinator pattern, with the book's lesson about naming routes by their action rather than a vague category:

ROUTES = {
  "explain_code":    read_only_explanation_flow,   # "what does this module do?"
  "fix_failing_test": code_edit_flow,              # "make test_auth pass"
  "research_lookup":  rag_retrieval_flow,          # "how does asyncio.gather schedule?"
  "destructive_op":   approval_gated_flow,         # "delete the generated files"
  "clarify":          ask_one_question_flow,       # fallback when intent is unclear
}

decision = router(context)          # -> {route, confidence, rationale, ctx}
if decision.confidence < TAU:       # abstain threshold (next section)
    branch = ROUTES["clarify"]
elif decision.route == "destructive_op":
    branch = ROUTES["destructive_op"]   # rule overrides model: always gate deletes
else:
    branch = ROUTES[decision.route]
log_route(decision)                 # route, confidence, rationale, outcome

The destructive_op line shows the most important guardrail in routing: a deterministic rule must override model judgment for anything dangerous. You never let an LLM's confidence alone decide whether to run a delete. The model can suggest the destructive route, but a rule, not the model, gates it (this becomes human-in-the-loop, lesson 15).

4 · The abstain threshold is a knob you must set

Because the routing decision is probabilistic, the single most consequential design choice is when to refuse to route and ask for clarification instead. That is a confidence threshold τ. Set it too low and the agent confidently sends ambiguous requests to the wrong specialist (a misroute — a legal question answered by a generic chatbot). Set it too high and the agent pesters users for clarification on requests it could have handled fine, hurting the experience and adding a round-trip.

This is a precision/recall tradeoff in disguise, and it has a number. Suppose your router, on a labeled validation set, produces a confidence score and you measure: when it routes (does not abstain), how often is the route correct? And of the requests it could have answered, how many did it needlessly punt to clarification? Sweep τ and you trace a curve. The widget below lets you do exactly that.

Abstain threshold — trade misroutes against needless clarifications

Each dot is a request the router scored. Its horizontal position is the router's confidence in its top choice; color is whether that choice was actually correct (green) or wrong (red). The vertical line is the abstain threshold τ: anything to its left is sent to clarify instead of routed. Drag τ and watch the three outcomes trade off — the goal is to abstain on the reds without abstaining on too many greens.

abstain threshold τ: 0.65

Routed correctly

—

Misrouted (wrong, not caught)

—

Sent to clarify

—

Of those, were answerable

—

Show the core JS

// each request has a confidence in [0,1] and a ground-truth correct flag.
// correct routes tend to score high; wrong routes tend to score low — but they overlap.
for (const r of requests) {
  if (r.confidence < TAU) outcome = "clarify";        // abstain
  else outcome = r.correct ? "routed_ok" : "misrouted"; // commit to the route
}
// raising TAU moves more low-confidence (mostly wrong) requests into clarify,
// cutting misroutes — but also abstains on some correct ones (wasted clarifications).

Worked number — picking τ

Say a batch of 100 requests scores like this: of the 100, 80 would route correctly and 20 incorrectly. The wrong ones cluster at low confidence: 15 of the 20 wrong routes score below 0.65, but so do 8 of the 80 correct ones. Set τ = 0.65 and you abstain on 23 requests (15 wrong + 8 right): you caught 15 of 20 misroutes (only 5 slip through) at the cost of 8 needless clarifications. Drop τ to 0.40 and maybe only 6 wrong routes fall below it — fewer clarifications, but now 14 misroutes ship. The right τ depends on the cost asymmetry: if a misroute to the destructive_op branch is catastrophic, you raise τ high for that route specifically; for a harmless FAQ misroute, a low τ is fine.

Two refinements the book's spirit implies. First, τ can be per-route, not global — a high bar for dangerous or expensive branches, a low bar for cheap reversible ones. Second, you do not have to fully commit at the threshold: a near-tie between two routes can fan out to both (lesson 05) or escalate to a human (lesson 15) rather than guess.

5 · Where this routes (no pun) wrong

Failure modes

Overlapping or vague labels. If "product_info" and "technical_support" can both describe the same query, the router cannot make a stable decision and confidence collapses. Routes must be mutually distinguishable and named by action.
No low-confidence fallback. Without an unclear/clarify default, an uncertain router still has to pick — so it guesses, confidently, on exactly the inputs it understands least.
Model routing bypassing policy. Letting LLM confidence alone trigger a destructive or privileged path. A rule must gate anything irreversible, regardless of how sure the model claims to be.
Closed-vocabulary leak. An LLM router emits free text ("this looks like a product question, maybe?") instead of one label, breaking the downstream switch. Constrain the output and validate it; fall back to unclear on any unparseable label.
Silent route drift. Input distribution shifts (a new product line) but the route set and classifier do not, so a growing slice quietly lands in the wrong branch. Without logged route + outcome you never see it.

Implementation checklist

Are the route labels mutually exclusive and named by the action they trigger, not a category?
Does every route — including the default — have explicit input and output contracts (lesson 02)?
What deterministic rule overrides model routing for dangerous/expensive branches?
Is there a fallback route, and a confidence threshold τ (global or per-route) that triggers it?
Does the router return destination + confidence + rationale + minimal context — not just a label?
Are route, confidence, evidence, and final outcome logged so misroutes can be sampled and fixed?
Could a cheap rule/embedding pre-filter handle the obvious cases before the LLM call?

Checkpoint exercise

Try it

Design five route labels for a domain assistant. Make each label an action ("refund_order", not "billing"), so it maps to a concrete branch. Add the fallback (clarify) route, decide which one rule must hard-gate, and pick an initial τ — then justify whether that τ should be the same for the dangerous route as for the harmless ones.

Where this points next

Routing answers "given this input, which one branch runs?" But sometimes the honest answer is "several." A research request might need retrieval and summarization and analysis at once; a near-tie between two routes might be best resolved by trying both and comparing. The next pattern — parallelization — keeps the router's classify-then-dispatch shape but fans control out to multiple branches that run concurrently, then fans the results back into one state. Lesson 05 builds the fan-out / fan-in machinery, the latency math that makes it worth doing, and the reducer that merges the branches the router lit up.

Takeaway

Routing is the move from a fixed control flow to a decided one: classify the situation, then dispatch to a specialized branch. A router must return more than a destination — destination, confidence, rationale, and minimal context — so it can abstain and be audited. The decision can be made by rules (fast, brittle), an LLM (flexible, costly), embeddings (semantic), or a trained classifier (cheap at inference, supervised), and good systems layer them: cheap rules for the certain, a model for the ambiguous. Because the decision is probabilistic, the abstain threshold τ is a real knob trading misroutes against needless clarifications, and a deterministic rule — never raw model confidence — must gate anything irreversible. The book's coordinator (LangChain RunnableBranch, Google ADK Auto-Flow, LangGraph state graphs) is this one idea in three framework dialects.

Interview prompts

What does routing add over a prompt chain? (§1 — conditional control flow: the agent classifies the situation and dispatches to a specialized branch instead of forcing every input through one fixed pipeline.)
Name the four routing mechanisms and when each wins. (§2 — rule-based: fast/deterministic/brittle, for hard constraints; LLM-based: flexible, costs an inference; embedding: semantic, cheap, many routes; trained classifier: cheap+fast at inference but needs labeled data. Layer cheap rules in front of the LLM.)
Why must a router return more than a label? (§1 — without confidence you cannot abstain; without rationale and logged outcome you cannot debug a misroute or detect route drift.)
How do you set the abstain threshold τ? (§4 — sweep it on a labeled set; raising τ catches more misroutes but causes more needless clarifications. Choose by the cost asymmetry — high τ for dangerous/expensive routes, low for harmless reversible ones; τ can be per-route.)
Should the LLM's confidence be allowed to trigger a file deletion? (§3, §5 — no; a deterministic rule must hard-gate irreversible/privileged actions regardless of model confidence, then escalate to human approval.)
Contrast LangGraph and Google ADK routing styles. (§3 — LangGraph: explicit state graph with nodes and transition functions, suits state-dependent multi-step routing; ADK: implicit Auto-Flow that matches the request to a sub-agent/tool by its description, suits clearly-named actions.)
How would you cut the cost of an expensive LLM router by half without losing quality? (§2 — put a rule/embedding pre-filter in front that handles the unambiguous majority for ~$0 and ~0 ms, and only send the nuanced remainder to the model.)