Part VI - Knowledge, communication, and optimization
A2A - agent-to-agent communication
For sixteen lessons everything lived inside one process: one control loop, one memory, tools the agent could call directly. Real systems are not one process. The cohort-analysis specialist runs on another team's stack, the report writer is a CrewAI agent, the calendar checker is a Google ADK service on its own port. To compose them, an agent must be able to find a peer it has never met, hand it a task, and trust the result. That is what the Agent-to-Agent (A2A) protocol standardizes, and it is the moment our agent stops being a program and becomes a node in a network.
contextId threading multi-turn state. (4) Choose among the four interaction mechanisms (sync, async polling, SSE streaming, push webhook) with a worked latency/cost calculation and an interactive widget. (5) Lock down identity, security, and provenance, then contrast A2A with MCP so you never confuse the two protocols again.New capability: networked task delegation across process, framework, and organizational boundaries — discovering a remote agent you did not write, sending it a typed task, and merging its result into your trace with provenance intact.
1 · Why a remote agent is not just another tool
In lesson 07 a tool was a function the agent called and got an answer from in one shot: get_weather("Paris") → "18°C". In lesson 09 multiple agents collaborated, but they shared one runtime, one memory store, one trace. The book's key observation in Chapter 15 is that a single agent — however capable — hits a ceiling on complex, multi-layered problems, and the way past that ceiling is to let agents built on different frameworks collaborate: a LangGraph planner, a CrewAI writer, a Google ADK calendar service. The instant a collaborator lives in another process you inherit every hard problem of distributed systems at once:
- You did not write it. It is a black box exposing an HTTP endpoint. You cannot read its memory or step its loop; you only see what it chooses to return. The book is explicit: the remote agent is a "black box" and the client need not know its internals.
- It can take minutes, not milliseconds. A "cohort analysis" might run SQL over a warehouse for two minutes. A synchronous request that blocks for two minutes will time out at every proxy in between.
- It can ask you a question back. Real delegation is multi-turn: the remote agent may need a clarification ("which date range?") before it can finish. A plain RPC has no slot for that.
- It can fail halfway, and you might retry. Retrying a non-idempotent remote task can double-charge a customer or double-write a report.
- It belongs to another team or company. So identity, authentication, and audit are not optional niceties — they are the contract.
"Just call its REST API" fails because every pair of agents would then invent its own ad-hoc shape for capability description, task IDs, streaming, multi-turn state, and auth — the integration cost the book calls out as the core problem (high cost, long cycles, siloed agents). A2A is the open, HTTP-based standard that fixes the shape once so any compliant agent can talk to any other. It is backed by a broad set of vendors — Atlassian, Box, LangChain, MongoDB, Salesforce, SAP, ServiceNow, with Microsoft integrating it into Azure AI Foundry and Copilot Studio — which is precisely what makes "an agent I have never met" a tractable thing to call.
2 · The four actors and the Agent Card
A2A names four entities. Hold them straight and the rest of the protocol falls out:
The Agent Card is the keystone. It is what makes a stranger usable: before the client sends a single task it reads the card to learn what the agent can do and how to talk to it. Here is the book's WeatherBot card, trimmed to its load-bearing fields:
agent_card = {
"name": "WeatherBot",
"description": "Accurate weather forecasts and historical data.",
"url": "http://weather-service.example.com/a2a", # the endpoint
"version": "1.0.0",
"capabilities": { # what interaction styles it supports
"streaming": True, # can push SSE incremental results
"pushNotifications": False, # cannot call back a webhook
"stateTransitionHistory": True
},
"authentication": { "schemes": ["apiKey"] },
"defaultInputModes": ["text"],
"defaultOutputModes": ["text"],
"skills": [
{ "id": "get_current_weather", "name": "Get current weather",
"description": "Real-time weather for any location.",
"examples": ["What's the weather in Paris now?"],
"tags": ["weather", "current", "real-time"] },
{ "id": "get_forecast", "name": "Get forecast",
"description": "5-day weather prediction.",
"examples": ["Will it rain in London this weekend?"],
"tags": ["weather", "forecast", "prediction"] }
]
}
Two design choices are worth dwelling on. First, capabilities are declared: the card tells the client up front whether it may subscribe to a stream or register a webhook, so the client picks an interaction style it knows will work. Second, skills carry examples and tags — these are not decoration. A planner agent uses them to match a sub-goal ("I need 5-day weather") to a capability (get_forecast) the same way a developer reads a function's docstring before calling it.
How a client finds the card (discovery)
The book lists three discovery mechanisms, trading openness against control:
/.well-known/agent.json. Public, zero-coordination discovery — point a client at a domain and it can self-configure. Best for open ecosystems.Whichever you use, the card endpoint itself must be secured (access control, mTLS, or network restriction), because even without secrets a card leaks your capability surface.
3 · The task lifecycle — submit, work, ask, finish
A2A communication is organized around the task: the fundamental unit of work for a long-running process. Crucially, tasks are asynchronous — designed for operations that may take real time — and each one carries a unique ID and moves through a small set of states. Agents exchange messages (metadata like priority/creation-time plus one or more content parts: text, files, or structured JSON), and the actual output the remote agent produces is an artifact (also part-based, streamable). All A2A traffic is HTTP(S) carrying JSON-RPC 2.0, and to keep context across multiple related tasks the server issues a contextId that ties them together.
The input-required state is what separates A2A from a dumb RPC. The remote agent can pause, ask the client for a missing parameter, and resume — all under one contextId, so neither side loses the thread. This is the networked analogue of the human-in-the-loop pause you built in lesson 15, except the "human" asking the clarifying question is another agent.
4 · Choosing an interaction mechanism — a latency budget
The card declares which styles an agent supports; the client picks the cheapest one that fits the job. A2A offers four, and the choice is a real engineering decision driven by how long the task runs and how fresh the client needs intermediate results:
| Mechanism | Method | Connection | Best when |
|---|---|---|---|
| Sync request/response | sendTask / tasks/send | one request, blocks for the full answer | fast ops (sub-second), client wants the whole answer at once |
| Async polling | sendTask → returns working + taskId; client polls | many short requests | long jobs, client checks status on an interval |
| Streaming (SSE) | sendTaskSubscribe / tasks/sendSubscribe | one persistent server→client stream | real-time incremental results (tokens, progress) |
| Push (webhook) | client registers a webhook URL | server calls client back on change | very long / resource-heavy jobs; client shouldn't hold a connection |
The two long-job options — polling and push — are not interchangeable. Polling is simple but wasteful: every poll is a round trip that usually returns "still working." Let us make the cost concrete.
Poll every 2 s: 120 / 2 = 60 polls. 59 of them return "working" — pure waste. Cost ≈ 60 × 0.002 = 0.12 USD per delegated task, and the client learns of completion up to 2 s late (average ~1 s).
Poll every 0.5 s for fresher status: 240 polls ≈ 0.48 USD and ≤0.5 s staleness — 4× the cost to shave 1.5 s.
Push (webhook): 1 register call + 1 callback = 2 messages ≈ 0.004 USD, completion known within network latency (~tens of ms), and the client holds no connection for 2 minutes. At 10,000 delegated analyses/day that is 0.12 × 10,000 = 1,200 USD/day on 2 s polling versus 40 USD/day on push — a 30× reduction. Streaming (SSE) sits between: one held connection, immediate increments, ideal when you actually want to show progress rather than just learn of completion.
Rule of thumb that drops out of the math: sub-second job → sync; you need to render progressive output → SSE; minutes-long fire-and-forget → push; push unavailable (card says pushNotifications: false) → fall back to polling at the slowest interval your staleness budget tolerates. The widget below lets you feel that trade-off directly.
The book gives the matching JSON-RPC shapes. A sync request uses sendTask and expects one complete answer; a streaming request uses sendTaskSubscribe to open a persistent connection over which the agent returns increments:
sync_request = { streaming_request = {
"jsonrpc": "2.0", "id": "1", "jsonrpc": "2.0", "id": "2",
"method": "sendTask", "method": "sendTaskSubscribe", # SSE
"params": { "params": {
"id": "task-001", "id": "task-002",
"sessionId": "session-001", "sessionId": "session-001",
"message": { "role": "user", "message": { "role": "user",
"parts": [{ "type": "text", "parts": [{ "type": "text",
"text": "USD to EUR rate?" }] }, "text": "JPY to GBP today?" }] },
"acceptedOutputModes": ["text/plain"], "acceptedOutputModes": ["text/plain"],
"historyLength": 5 } } "historyLength": 5 } }
5 · Identity, security, and provenance
Because a remote agent may belong to another org, A2A treats security as part of the architecture, not an add-on. The book lists four mechanisms, and each one maps to a failure it prevents:
- Mutual TLS (mTLS): both sides authenticate the connection — prevents an impostor agent from receiving your task or spoofing a result.
- Complete audit logs: every inter-agent message — who, what, when — is recorded. This is also your provenance trail: when the data agent returns a chart, the log says which agent produced it from which inputs.
- Agent Card declarations: auth requirements (
"schemes": ["apiKey"], or OAuth 2.0) are stated in the card, so the client knows how to authenticate before it ever connects. - Credential handling: tokens (OAuth 2.0) or API keys travel in HTTP headers — never in the URL or message body, where they would land in logs and caches.
Provenance is the piece that ties back to lesson 16. RAG taught us that an unsupported claim is a liability; A2A makes that worse, because one agent's conclusion silently becomes another agent's premise. If the data agent returns "churn is up 12%" with no queries, no caveats, no source, the product assistant will repeat it as fact. The defense is the same contract the book pushes: the delegated task declares an output_schema, and the artifact must carry evidence (the queries it ran, the charts, the caveats) that the caller validates before merging. Validation plus the joined audit log is how a multi-agent answer stays traceable.
6 · A2A vs MCP — two protocols, two jobs
This is the single most-tested distinction in the chapter, and it is easy to get backwards. The book states it cleanly: the two are complementary.
| MCP (lesson 12) | A2A (this lesson) | |
|---|---|---|
| Connects | an agent ⟷ tools, data, resources | an agent ⟷ another agent |
| Standardizes | structured access to context and tools | coordination, delegation, communication |
| The other party is | a tool/resource server (passive, you drive it) | an autonomous agent (it reasons, may ask back) |
| Unit of interaction | a tool call / resource read | a task with a lifecycle and artifacts |
| One-liner | how an agent reaches into the world | how an agent talks to a peer |
In our running system the product assistant uses MCP to read the warehouse schema and call a SQL tool, and uses A2A to delegate the whole cohort-analysis job to a specialized data agent that itself uses MCP internally. Same system, both protocols, no overlap: MCP is the tool plug, A2A is the agent-to-agent phone line.
The ADK worked example — standing up an A2A server
The book's concrete code builds a Google ADK "Calendar Agent" and exposes it over A2A. The shape generalizes to any remote agent you would delegate to:
# 1. Build the agent (ADK LlmAgent over a tool)
async def create_agent(client_id, client_secret) -> LlmAgent:
toolset = CalendarToolset(client_id=client_id, client_secret=client_secret)
return LlmAgent(model="gemini-2.0-flash-001", name="calendar_agent",
description="Helps manage the user's calendar.",
instruction="...use the tools to read/modify the calendar...",
tools=await toolset.get_tools())
# 2. Declare identity: a skill + an Agent Card
skill = AgentSkill(id="check_availability", name="Check availability",
description="Check if the user is free in a time window",
tags=["calendar"], examples=["Am I free 10-11am tomorrow?"])
agent_card = AgentCard(name="Calendar Agent", url=f"http://{host}:{port}/",
version="1.0.0", defaultInputModes=["text"],
defaultOutputModes=["text"],
capabilities=AgentCapabilities(streaming=True),
skills=[skill])
# 3. Wire executor + task store, mount on Starlette, serve over HTTP
runner = Runner(app_name=agent_card.name, agent=adk_agent,
artifact_service=InMemoryArtifactService(),
session_service=InMemorySessionService(),
memory_service=InMemoryMemoryService())
agent_executor = ADKAgentExecutor(runner, agent_card)
request_handler = DefaultRequestHandler(agent_executor, task_store=InMemoryTaskStore())
a2a_app = A2AStarletteApplication(agent_card=agent_card, http_handler=request_handler)
uvicorn.run(Starlette(routes=a2a_app.routes()), host=host, port=port)
Read it as three layers: capability (the LlmAgent + its tools), identity (the AgentCard declaring streaming and the check_availability skill), and service (a task store, an executor, and a Starlette/Uvicorn HTTP surface). That separation is exactly why a CrewAI agent and an ADK agent can talk: they agree on the card and the task lifecycle, and disagree freely about everything inside the box. The book notes the official samples cover LangGraph, CrewAI, Azure AI Foundry, and AG2; tools like Trickle AI visualize and trace A2A traffic for debugging and optimization — the cross-agent analogue of the trace inspection you'll build in lesson 21.
Running example — the product assistant delegates cohort analysis
Threading it all together. The product assistant (client agent) gets "why did Pro churn jump last month?"
cohort-analysis; read the data agent's card — it supports streaming and pushNotifications, auth is OAuth 2.0.sendTask with parts = {goal, date range, segment="Pro"}, output_schema="analysis_report_v1", deadline 10 min, idempotency_key=task_id. Estimated 120 s ⇒ register a webhook rather than block.input-required: "include trialists?" Client replies under the same contextId; state resumes.completed; artifact carries the SQL, two charts, and caveats. Validate against the schema; merge into the trace with the data agent's audit-log IDs as provenance.Checkpoint exercise
name, url, version, capabilities (which of streaming/push do you support, and why?), authentication schemes, and one skill with an id, description, two example prompts, and tags. Then justify, using the §4 latency math, whether a 120 s job should be polled or pushed.Failure modes
- No clear owner on remote failure. The remote task dies and neither side is responsible for retry or cleanup. Fix: the task contract names the owner and the failure semantics up front.
- Retrying non-idempotent work. A timed-out "generate report" gets retried and the customer is billed twice. Fix: an
idempotency_keythe server dedups on. - Trusting remote output blindly. The data agent's "churn +12%" becomes premise with no evidence. Fix: validate against
output_schema; require evidence parts; merge provenance. - Holding a 2-minute synchronous connection. Proxies time it out; the result is lost. Fix: async + push/SSE for long jobs (§4).
- Leaked credentials. API key in the URL lands in access logs. Fix: tokens in HTTP headers only; mTLS on the channel.
Implementation checklist
- How does discovery work — well-known URI, registry, or direct config? Is the card endpoint secured?
- What task states exist (submitted / working / input-required / completed / failed) and who handles each?
- Can the caller cancel, and what happens to in-flight work?
- Which interaction mechanism, and does the card actually declare it?
- Are retries idempotent (
idempotency_key)? - Is there a
contextIdthreading multi-turn state? - How are traces joined across agents (audit-log IDs, contextId in provenance)?
- Auth scheme stated in the card; credentials in headers; mTLS on the wire?
Where this points next
A2A gave the agent reach — it can now compose specialists across processes and organizations. But reach is expensive: every delegated task is more tokens, more latency, more dollars (recall the polling bill — 1,200 USD/day from one careless interval). Lesson 18, Resource-aware optimization, treats compute, time, money, context window, tools, and human attention as scarce budgets the agent must spend deliberately — which mechanism to choose, which model tier to route to, when delegation is worth its network cost. The latency/cost calculation you just did for poll-vs-push is the first instance of the general discipline that lesson formalizes.
contextId, returning a part-based artifact. Pick an interaction mechanism by the latency budget: sync for fast ops, SSE to stream progress, push (webhook) for long jobs, polling as the fallback — and the poll-vs-push math is real money. Security is built in: mTLS, OAuth/API-key in headers, audit logs that double as provenance. And never confuse the two protocols: MCP connects an agent to tools and resources; A2A connects an agent to another agent.Interview prompts
- What problem does A2A solve that calling a tool (or a plain REST API) does not? (§1 — a remote agent is a long-running, black-box, multi-turn, possibly-third-party peer; A2A standardizes capability discovery, task lifecycle, streaming, multi-turn input-required state, and auth so any compliant agents interoperate without bespoke integration.)
- What is an Agent Card and what does it contain? (§2 — the remote agent's JSON identity: name, endpoint URL, version, declared capabilities (streaming/push), skills with examples/tags, default I/O modes, and authentication schemes; clients read it before sending any task.)
- How does a client discover a remote agent? (§2 — well-known URI (/.well-known/agent.json) for open discovery, a managed registry for enterprise governance, or direct configuration for fixed/private peers; secure the card endpoint regardless.)
- Walk the task lifecycle and explain
input-requiredandcontextId. (§3 — submitted → working → input-required → completed/failed; input-required lets the remote agent pause and ask the client for a missing parameter, and contextId threads multi-turn/related tasks so context survives.) - A delegated job takes ~2 minutes. Which interaction mechanism, and why not synchronous? (§4 — push (webhook) or SSE; sync blocks and times out at proxies; vs polling, push is one register + one callback instead of dozens of wasted round-trips — roughly a 30× cost cut at 2 s intervals — and learns of completion immediately.)
- How is A2A different from MCP? (§6 — MCP connects an agent to tools/resources/context (passive servers you drive); A2A connects an agent to another autonomous agent for delegation and coordination via tasks; they are complementary and often used together.)
- How do you keep a multi-agent answer trustworthy and traceable? (§5 — declare an output_schema and require evidence parts in the artifact, validate before merging, and join audit-log IDs / contextId into your trace so one agent's conclusion isn't silently accepted as another's premise.)