Part IV - State, collaboration, and protocols
Multi-agent collaboration - roles, handoffs, synthesis
Up to now we have hardened one control loop: it routes, parallelizes its own subtasks, reflects on its own output, calls tools, and plans. This lesson asks the opposite question — when is it better to stop making one loop smarter and instead split the work across several specialized loops that talk to each other? The honest answer is "rarely, and only for specific reasons," and most of this lesson is about earning the right to add a second agent.
New capability: coordinating several role-specialized loops through typed artifacts and an explicit communication structure, so that division of expertise reduces complexity instead of multiplying it.
Not yet: remote agents that don't share a process or a trust boundary — that's lesson 17 (A2A). Here, all agents live in one orchestrated system.
1 · The one rule: add an agent only when it owns something
A single agent is efficient on well-scoped problems. The book is blunt that it gets worse, not better, as the task spreads across domains: one loop juggling research, statistics, and report-writing becomes the bottleneck — its context fills with unrelated material, its tool list balloons, and its single prompt has to be good at everything at once. Multi-agent collaboration is the structural answer: decompose the high-level goal into sub-problems and assign each to an agent that has the right tools, data access, or reasoning role for it.
The operative word is owns. A new agent must own at least one of four distinct things, or it is pure overhead:
apply_patch or touch production.Define an agent's role as the pair (responsibility, tool set). If two proposed agents share both, they are the same agent wearing two hats — merge them. This is the multi-agent analogue of the lesson-08 rule that every plan step must change state: every agent must change who is accountable for some artifact.
2 · The six forms and the six topologies
The book separates two things that beginners conflate: the form of collaboration (how work flows) and the topology (who is allowed to talk to whom). Get both right and the system is debuggable; get either wrong and you have an expensive chat room.
Collaboration forms (how the work flows):
Communication topologies (who talks to whom): the book's "relationship and communication structure" axis, from simplest to most flexible.
| Topology | Structure | Strength | Risk |
|---|---|---|---|
| 1 · Single | One agent, no peers. | Simplest; no coordination cost. | Capability-bound on multi-domain work. |
| 2 · Network | Decentralized peer-to-peer; all share data/tasks. | Resilient, no single point of failure. | Hard to keep decisions consistent; chatty. |
| 3 · Supervisor | One supervisor assigns tasks and resolves conflict. | Clear hierarchy, easy to manage. | Single point of failure / bottleneck. |
| 4 · Tool-supervisor | Supervisor enables (gives resources/guidance) rather than commands. | More flexible; less rigid control. | Looser coordination; weaker guarantees. |
| 5 · Hierarchical | Multiple supervisor layers over operational agents. | Scales to deep problems; distributed decisions. | Latency and complexity grow with depth. |
| 6 · Custom | Bespoke mix tuned to the domain. | Maximal fit; can exploit emergent behavior. | Needs deep design of protocols and coordination. |
The book's guidance for picking among these: weigh task complexity, number of agents, autonomy needs, robustness, and communication overhead. For most production systems the sweet spot is topology 3 (supervisor) running the sequential-handoff and critic-reviewer forms — it keeps a single owner of state while still isolating expertise. We'll build exactly that in §4.
3 · The trade-off, made numeric
Every agent you add buys specialization and pays coordination. The coordination tax is real and easy to under-count: each handoff is an extra LLM round-trip with its own latency, its own token cost, and its own chance to garble the artifact. Make it concrete.
Worked example — a research blog, one agent vs a crew. Suppose one strong generalist agent can produce a 500-word AI-trends blog in a single pass: ~6,000 input tokens (instructions + retrieved context) and ~1,200 output tokens, about 9 s of wall-clock, at a model priced \$3 / 1M input and \$15 / 1M output. That is (6000·3 + 1200·15)/1e6 = \$0.018 + \$0.018 = \$0.036 and ~9 s.
Now split it the way the book's CrewAI example does — a researcher agent and a writer agent in sequential handoff:
- Researcher: 5,000 in / 1,500 out → (5000·3 + 1500·15)/1e6 = \$0.0375, ~7 s.
- Handoff: the writer receives the researcher's 1,500-token summary as context — not the raw 40 sources. Writer: 2,500 in / 1,200 out → (2500·3 + 1200·15)/1e6 = \$0.0255, ~6 s.
- Total: \$0.063 and, because it is sequential, ~13 s.
So the crew costs 1.75× the money and 1.4× the latency. That is the coordination tax. It is only worth paying if the split buys back more than it costs — here, if the researcher's isolated context lets it ground 40 sources without blowing the writer's window, and the writer produces noticeably better prose because its prompt is about writing and nothing else. If the generalist's single pass is already good enough, the crew is strictly worse. The lesson: measure the quality lift before you ship the topology.
When parallelism flips the latency math. If the two roles were independent (say a weather-fetcher and a news-fetcher, the book's ADK ParallelAgent example), running them concurrently makes wall-clock ≈ max(7, 6) = 7 s instead of 13 s — now the crew can be faster than the generalist while still paying the extra tokens. Sequential handoff adds latency; parallel fan-out hides it. Choosing the form is choosing your latency profile.
4 · Running example — the coding/research assistant as a supervised crew
Take our running coding/research assistant and split a code-change task into a supervised expert team. Four roles, each owning a skill, a context, a permission, and (for the reviewer) an evaluation:
| Role | Owns | Tools | Produces (typed artifact) |
|---|---|---|---|
| Researcher | finding & summarizing context | search, read_docs | ResearchBrief{findings[], sources[]} |
| Implementer | writing the change | read_file, apply_patch | Patch{diff, rationale} |
| Reviewer | independent judgment | diff, run_tests | Review{verdict, issues[]} |
| Coordinator | state & final synthesis | (delegation only) | Result{patch, review, decision} |
The coordinator is the supervisor (topology 3). The flow is sequential handoff with a critic-reviewer loop bolted on. Crucially, roles do not freely overwrite one another — they emit artifacts; the coordinator owns the shared state and decides when to loop back.
How the book's two frameworks express this. CrewAI models it as a Crew of Agents (each with a role, goal, backstory) and Tasks with explicit context=[...] dependencies, run with Process.sequential. The book's worked CrewAI sample builds exactly a two-agent researcher → writer blog crew on Gemini 2.0 Flash, where the writing task declares context=[research_task] so the handoff is a typed dependency, not a chat:
# CrewAI — the book's researcher → writer crew (paraphrased)
researcher = Agent(role="Senior Research Analyst",
goal="Find and summarize the latest AI trends.",
allow_delegation=False)
writer = Agent(role="Technical Content Writer",
goal="Write a clear blog from the research.",
allow_delegation=False)
research_task = Task(description="Research 3 emerging AI trends (2024-25).",
expected_output="Detailed summary w/ key points + sources.",
agent=researcher)
writing_task = Task(description="Write a 500-word blog from the research.",
expected_output="A complete 500-word blog.",
agent=writer,
context=[research_task]) # <- typed handoff
crew = Crew(agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential)
result = crew.kickoff()
Google ADK exposes the topologies directly as composable agents. The same patterns from §2 map onto concrete classes:
| ADK construct | Realizes | Behavior |
|---|---|---|
LlmAgent(sub_agents=[...]) | Hierarchy / supervisor | A coordinator delegates to children (e.g. Greeter, TaskExecutor); parent/child links are explicit. |
SequentialAgent | Sequential handoff | Runs children in order; step1 writes session.state["data"], step2 reads it. |
ParallelAgent | Parallel processing | Runs children concurrently; each writes its own output_key into shared state. |
LoopAgent(max_iterations=N) | Critic / reviewer loop | Repeats a step + a ConditionChecker until status="completed" or N hit (bounded, by design). |
AgentTool(agent=...) | Agent-as-tool | One agent is wrapped as a callable tool for another (e.g. an artist_agent invoking an ImageGen agent). |
Two design points the ADK examples make sharp. First, shared state is the communication channel — agents read/write keyed entries in session.state rather than passing free-form chat, which keeps the trace inspectable (who wrote data? who read it?). Second, the LoopAgent always carries max_iterations: the book never lets a critic loop run unbounded — there is always a verifier with a stopping rule. That is the antidote to the "debate that never converges" failure in §5.
The book also notes the "agent-as-tool" pattern blurs lesson 07 (tool use) and this one: from the caller's side a sub-agent is just a tool with a typed signature. That is the cleanest way to add an expert without inventing a whole new protocol — and it previews lesson 17, where the tool happens to live on another machine.
5 · Failure modes and the checklist
Failure modes
- No owner of final state. A "chat room" of peers (topology 2 done badly) where everyone edits and nobody decides — the output is whatever the last agent happened to say.
- Overlapping roles. Two agents share responsibility and tools, so they redo each other's work and disagree on what's done. Symptom: cost doubles, quality doesn't move.
- Unbounded debate. A debate/critic form with no decision rule or
max_iterationsloops forever or oscillates. The fix is ADK's boundedLoopAgent+ an explicit verdict. - Conversational noise as handoff. Agents pass long transcripts instead of typed artifacts, so each downstream window fills with the upstream agent's thinking-out-loud, costing tokens and inviting drift.
- Supervisor bottleneck / SPOF. The topology-3 supervisor is a single point of failure; if it stalls, the whole crew stalls. Budget a timeout and a fallback.
- Lost provenance. The trace doesn't record who decided what on which evidence, so a wrong final answer is undebuggable.
Implementation checklist
- For each agent: which of skill / context / permission / evaluation does it own? If none, delete it.
- What typed artifact does each role produce, and what's the schema?
- Who owns shared state and final synthesis? (Name one coordinator.)
- Handoffs pass artifacts and decisions, never raw transcripts.
- Every loop (debate, critic) has a decision rule and a
max_iterationsbound. - Does the trace record decider, evidence, and handoff points for each step?
- Did you measure the quality lift against the single-agent baseline before shipping?
Where this points next
We can now coordinate several role-specialized loops through typed handoffs and shared state. But notice what made the handoffs clean: the researcher's brief, the implementer's patch, the reviewer's verdict all had to be stored somewhere the coordinator could read later — ADK's session.state was doing quiet, load-bearing work. That shared store is not just a pass-through buffer; deciding what to keep, for how long, and at what scope is its own discipline. Lesson 10 (Memory management) treats memory as scoped storage with explicit write and read policies — session, state, and long-term knowledge — rather than an ever-growing transcript. Multi-agent systems make that need acute, because now multiple writers contend for the same memory.
max_iterations bound; and keep a trace of who decided what on which evidence. CrewAI (Crew + Task.context) and Google ADK (Sequential/Parallel/LoopAgent/AgentTool) are just typed spellings of these same structures.
Interview prompts
- When does adding a second agent help, and when is it pure overhead? (§1 — only when it owns a distinct skill, context, permission, or evaluation; if two agents share both responsibility and tools, merge them.)
- What is the coordination tax, and how do you decide it's worth paying? (§3 — each handoff adds an LLM round-trip's latency and tokens; a sequential 2-agent split cost ~1.75× and ~1.4× latency, so you ship it only if the measured quality lift beats that premium.)
- Distinguish a collaboration form from a topology. (§2 — form = how work flows, e.g. sequential handoff vs debate vs hierarchy; topology = who may talk to whom, e.g. single / network / supervisor / hierarchical / custom.)
- How do you stop a debate or critic loop from running forever? (§4/§5 — give it an explicit decision rule plus a bound, exactly like ADK's
LoopAgent(max_iterations)with aConditionCheckerverdict.) - Why pass typed artifacts instead of conversation transcripts between agents? (§4/§5 — artifacts keep downstream context small and the trace inspectable; CrewAI's
Task(context=[research_task])makes the handoff a typed dependency, not chat.) - Your multi-agent system is slower and pricier than the old single agent but no more accurate. What's likely wrong? (§5 — overlapping roles redoing work, transcript-as-handoff bloating context, or a split that never bought a quality lift; re-check ownership and measure against the single-agent baseline.)
- How does "agent-as-tool" relate to plain tool use? (§4 — wrapping a sub-agent as an
AgentToolgives the caller a typed callable; from the caller's side an expert agent is just a tool, which also previews remote A2A in lesson 17.)