Part VI - Knowledge, communication, and optimization

A2A - agent-to-agent communication

For sixteen lessons everything lived inside one process: one control loop, one memory, tools the agent could call directly. Real systems are not one process. The cohort-analysis specialist runs on another team's stack, the report writer is a CrewAI agent, the calendar checker is a Google ADK service on its own port. To compose them, an agent must be able to find a peer it has never met, hand it a task, and trust the result. That is what the Agent-to-Agent (A2A) protocol standardizes, and it is the moment our agent stops being a program and becomes a node in a network.

Book source

Chapter 15 — Agent-to-Agent Communication (智能体间通信 / A2A); PDF outline pages 164–174. Google's open A2A standard, the Agent Card, discovery, task lifecycle, the four interaction mechanisms, security, the A2A-vs-MCP contrast, and the ADK calendar-agent worked example.

The plan

Five moves. (1) Pin down why a remote agent is different from a local tool — and why "just call its REST API" is not enough. (2) Meet the four actors and the Agent Card, the JSON identity that makes a stranger discoverable. (3) Walk the task lifecycle: submit → working → input-required → completed/failed, with contextId threading multi-turn state. (4) Choose among the four interaction mechanisms (sync, async polling, SSE streaming, push webhook) with a worked latency/cost calculation and an interactive widget. (5) Lock down identity, security, and provenance, then contrast A2A with MCP so you never confuse the two protocols again.

Linear position

Prerequisite: lesson 09 (multi-agent collaboration — roles, handoffs, synthesis inside one runtime), lesson 12 (MCP — the integration contract for tools and resources), and lesson 16 (RAG — agents grounded in shared evidence).
New capability: networked task delegation across process, framework, and organizational boundaries — discovering a remote agent you did not write, sending it a typed task, and merging its result into your trace with provenance intact.

1 · Why a remote agent is not just another tool

In lesson 07 a tool was a function the agent called and got an answer from in one shot: get_weather("Paris") → "18°C". In lesson 09 multiple agents collaborated, but they shared one runtime, one memory store, one trace. The book's key observation in Chapter 15 is that a single agent — however capable — hits a ceiling on complex, multi-layered problems, and the way past that ceiling is to let agents built on different frameworks collaborate: a LangGraph planner, a CrewAI writer, a Google ADK calendar service. The instant a collaborator lives in another process you inherit every hard problem of distributed systems at once:

You did not write it. It is a black box exposing an HTTP endpoint. You cannot read its memory or step its loop; you only see what it chooses to return. The book is explicit: the remote agent is a "black box" and the client need not know its internals.
It can take minutes, not milliseconds. A "cohort analysis" might run SQL over a warehouse for two minutes. A synchronous request that blocks for two minutes will time out at every proxy in between.
It can ask you a question back. Real delegation is multi-turn: the remote agent may need a clarification ("which date range?") before it can finish. A plain RPC has no slot for that.
It can fail halfway, and you might retry. Retrying a non-idempotent remote task can double-charge a customer or double-write a report.
It belongs to another team or company. So identity, authentication, and audit are not optional niceties — they are the contract.

"Just call its REST API" fails because every pair of agents would then invent its own ad-hoc shape for capability description, task IDs, streaming, multi-turn state, and auth — the integration cost the book calls out as the core problem (high cost, long cycles, siloed agents). A2A is the open, HTTP-based standard that fixes the shape once so any compliant agent can talk to any other. It is backed by a broad set of vendors — Atlassian, Box, LangChain, MongoDB, Salesforce, SAP, ServiceNow, with Microsoft integrating it into Azure AI Foundry and Copilot Studio — which is precisely what makes "an agent I have never met" a tractable thing to call.

Mental model

A2A is to agents what HTTP + a business card is to companies. The Agent Card is the business card (who I am, what I can do, how to reach me, how to authenticate). The task is the contract you sign (here is the job, the inputs, the deadline, the artifact I expect back). The protocol is the postal system that carries signed messages reliably and logs every one of them.

2 · The four actors and the Agent Card

A2A names four entities. Hold them straight and the rest of the protocol falls out:

User

The human (or upstream system) that initiates the need for help.

A2A client (client agent)

The app or agent acting on the user's behalf, requesting an action or information.

A2A server (remote agent)

An agent exposing an HTTP endpoint that handles requests and returns results. A black box — the client never sees its internals.

Agent Card

The remote agent's digital identity: a JSON document advertising name, endpoint, version, capabilities, skills, I/O modes, and auth.

The Agent Card is the keystone. It is what makes a stranger usable: before the client sends a single task it reads the card to learn what the agent can do and how to talk to it. Here is the book's WeatherBot card, trimmed to its load-bearing fields:

agent_card = {
  "name": "WeatherBot",
  "description": "Accurate weather forecasts and historical data.",
  "url": "http://weather-service.example.com/a2a",   # the endpoint
  "version": "1.0.0",
  "capabilities": {                # what interaction styles it supports
    "streaming": True,             # can push SSE incremental results
    "pushNotifications": False,    # cannot call back a webhook
    "stateTransitionHistory": True
  },
  "authentication": { "schemes": ["apiKey"] },
  "defaultInputModes":  ["text"],
  "defaultOutputModes": ["text"],
  "skills": [
    { "id": "get_current_weather", "name": "Get current weather",
      "description": "Real-time weather for any location.",
      "examples": ["What's the weather in Paris now?"],
      "tags": ["weather", "current", "real-time"] },
    { "id": "get_forecast", "name": "Get forecast",
      "description": "5-day weather prediction.",
      "examples": ["Will it rain in London this weekend?"],
      "tags": ["weather", "forecast", "prediction"] }
  ]
}

Two design choices are worth dwelling on. First, capabilities are declared: the card tells the client up front whether it may subscribe to a stream or register a webhook, so the client picks an interaction style it knows will work. Second, skills carry examples and tags — these are not decoration. A planner agent uses them to match a sub-goal ("I need 5-day weather") to a capability (get_forecast) the same way a developer reads a function's docstring before calling it.

How a client finds the card (discovery)

The book lists three discovery mechanisms, trading openness against control:

AWell-known URI. The agent hosts its card at a standard path such as /.well-known/agent.json. Public, zero-coordination discovery — point a client at a domain and it can self-configure. Best for open ecosystems.

BManaged registry. A central catalog where agents publish cards and clients query by capability/tag, with access control. Best for an enterprise with many internal agents and a governance requirement.

CDirect configuration. The card is embedded or shared privately — no dynamic lookup. Best for tightly-coupled or private systems where the peers are fixed.

Whichever you use, the card endpoint itself must be secured (access control, mTLS, or network restriction), because even without secrets a card leaks your capability surface.

3 · The task lifecycle — submit, work, ask, finish

A2A communication is organized around the task: the fundamental unit of work for a long-running process. Crucially, tasks are asynchronous — designed for operations that may take real time — and each one carries a unique ID and moves through a small set of states. Agents exchange messages (metadata like priority/creation-time plus one or more content parts: text, files, or structured JSON), and the actual output the remote agent produces is an artifact (also part-based, streamable). All A2A traffic is HTTP(S) carrying JSON-RPC 2.0, and to keep context across multiple related tasks the server issues a contextId that ties them together.

CLIENT AGENT REMOTE AGENT (black box) ───────────── ──────────────────────── read Agent Card ───────discovery───────▶ /.well-known/agent.json │ sendTask(task-001) ──────JSON-RPC───────▶ state: submitted │ │ │ ◀──── "working" + taskId ──────────────┤ (long job starts) │ │ │ ◀──── "input-required: date range?" ───┤ (needs more info) │ │ reply(parts=[...]) ───same contextId────▶ │ (multi-turn, context kept) │ │ │ ◀──── artifact: report_v1 ─────────────┤ state: completed │ │ (or: failed + reason) validate(artifact, output_schema) merge into trace with provenance

The input-required state is what separates A2A from a dumb RPC. The remote agent can pause, ask the client for a missing parameter, and resume — all under one contextId, so neither side loses the thread. This is the networked analogue of the human-in-the-loop pause you built in lesson 15, except the "human" asking the clarifying question is another agent.

4 · Choosing an interaction mechanism — a latency budget

The card declares which styles an agent supports; the client picks the cheapest one that fits the job. A2A offers four, and the choice is a real engineering decision driven by how long the task runs and how fresh the client needs intermediate results:

Mechanism	Method	Connection	Best when
Sync request/response	`sendTask` / `tasks/send`	one request, blocks for the full answer	fast ops (sub-second), client wants the whole answer at once
Async polling	`sendTask` → returns working + taskId; client polls	many short requests	long jobs, client checks status on an interval
Streaming (SSE)	`sendTaskSubscribe` / `tasks/sendSubscribe`	one persistent server→client stream	real-time incremental results (tokens, progress)
Push (webhook)	client registers a webhook URL	server calls client back on change	very long / resource-heavy jobs; client shouldn't hold a connection

The two long-job options — polling and push — are not interchangeable. Polling is simple but wasteful: every poll is a round trip that usually returns "still working." Let us make the cost concrete.

Worked example — poll vs push for the cohort-analysis task

Our product assistant delegates a cohort analysis that takes the remote data agent T = 120 s. Each status poll is an HTTP round trip costing ~40 ms of wall time and, say, 0.002 USD in egress + handler invocation on a busy fleet.

Poll every 2 s: 120 / 2 = 60 polls. 59 of them return "working" — pure waste. Cost ≈ 60 × 0.002 = 0.12 USD per delegated task, and the client learns of completion up to 2 s late (average ~1 s).

Poll every 0.5 s for fresher status: 240 polls ≈ 0.48 USD and ≤0.5 s staleness — 4× the cost to shave 1.5 s.

Push (webhook): 1 register call + 1 callback = 2 messages ≈ 0.004 USD, completion known within network latency (~tens of ms), and the client holds no connection for 2 minutes. At 10,000 delegated analyses/day that is 0.12 × 10,000 = 1,200 USD/day on 2 s polling versus 40 USD/day on push — a 30× reduction. Streaming (SSE) sits between: one held connection, immediate increments, ideal when you actually want to show progress rather than just learn of completion.

Rule of thumb that drops out of the math: sub-second job → sync; you need to render progressive output → SSE; minutes-long fire-and-forget → push; push unavailable (card says pushNotifications: false) → fall back to polling at the slowest interval your staleness budget tolerates. The widget below lets you feel that trade-off directly.

Interaction-mechanism cost — push vs poll for a long task

A remote task runs for T seconds. Polling sends one status round-trip every interval seconds; most return "working." Push registers once and gets one callback. The bars compare wasted round-trips and cost. Watch how completion staleness (how late you learn it finished) trades against polling cost — and how push collapses both.

task duration T (s): 120 poll interval (s): 2.0 cost / message (USD): 0.0020

Polls sent

Wasted polls

Poll cost

$0.120

Push cost

$0.004

Avg staleness

1.0 s

Push saves

30×

Show the core JS

const polls   = Math.ceil(T / interval);   // one round-trip per interval
const wasted  = Math.max(0, polls - 1);    // all but the final poll say "working"
const pollCost = polls * costPerMsg;
const pushCost = 2 * costPerMsg;           // 1 register + 1 callback
const avgStale = interval / 2;             // expected delay learning of completion
const factor   = pollCost / pushCost;      // how many × cheaper push is

The book gives the matching JSON-RPC shapes. A sync request uses sendTask and expects one complete answer; a streaming request uses sendTaskSubscribe to open a persistent connection over which the agent returns increments:

sync_request = {                       streaming_request = {
  "jsonrpc": "2.0", "id": "1",           "jsonrpc": "2.0", "id": "2",
  "method": "sendTask",                  "method": "sendTaskSubscribe",   # SSE
  "params": {                            "params": {
    "id": "task-001",                      "id": "task-002",
    "sessionId": "session-001",            "sessionId": "session-001",
    "message": { "role": "user",           "message": { "role": "user",
      "parts": [{ "type": "text",            "parts": [{ "type": "text",
        "text": "USD to EUR rate?" }] },       "text": "JPY to GBP today?" }] },
    "acceptedOutputModes": ["text/plain"], "acceptedOutputModes": ["text/plain"],
    "historyLength": 5 } }                 "historyLength": 5 } }

5 · Identity, security, and provenance

Because a remote agent may belong to another org, A2A treats security as part of the architecture, not an add-on. The book lists four mechanisms, and each one maps to a failure it prevents:

Mutual TLS (mTLS): both sides authenticate the connection — prevents an impostor agent from receiving your task or spoofing a result.
Complete audit logs: every inter-agent message — who, what, when — is recorded. This is also your provenance trail: when the data agent returns a chart, the log says which agent produced it from which inputs.
Agent Card declarations: auth requirements ("schemes": ["apiKey"], or OAuth 2.0) are stated in the card, so the client knows how to authenticate before it ever connects.
Credential handling: tokens (OAuth 2.0) or API keys travel in HTTP headers — never in the URL or message body, where they would land in logs and caches.

Provenance is the piece that ties back to lesson 16. RAG taught us that an unsupported claim is a liability; A2A makes that worse, because one agent's conclusion silently becomes another agent's premise. If the data agent returns "churn is up 12%" with no queries, no caveats, no source, the product assistant will repeat it as fact. The defense is the same contract the book pushes: the delegated task declares an output_schema, and the artifact must carry evidence (the queries it ran, the charts, the caveats) that the caller validates before merging. Validation plus the joined audit log is how a multi-agent answer stays traceable.

6 · A2A vs MCP — two protocols, two jobs

This is the single most-tested distinction in the chapter, and it is easy to get backwards. The book states it cleanly: the two are complementary.

	MCP (lesson 12)	A2A (this lesson)
Connects	an agent ⟷ tools, data, resources	an agent ⟷ another agent
Standardizes	structured access to context and tools	coordination, delegation, communication
The other party is	a tool/resource server (passive, you drive it)	an autonomous agent (it reasons, may ask back)
Unit of interaction	a tool call / resource read	a task with a lifecycle and artifacts
One-liner	how an agent reaches into the world	how an agent talks to a peer

In our running system the product assistant uses MCP to read the warehouse schema and call a SQL tool, and uses A2A to delegate the whole cohort-analysis job to a specialized data agent that itself uses MCP internally. Same system, both protocols, no overlap: MCP is the tool plug, A2A is the agent-to-agent phone line.

The ADK worked example — standing up an A2A server

The book's concrete code builds a Google ADK "Calendar Agent" and exposes it over A2A. The shape generalizes to any remote agent you would delegate to:

# 1. Build the agent (ADK LlmAgent over a tool)
async def create_agent(client_id, client_secret) -> LlmAgent:
    toolset = CalendarToolset(client_id=client_id, client_secret=client_secret)
    return LlmAgent(model="gemini-2.0-flash-001", name="calendar_agent",
                    description="Helps manage the user's calendar.",
                    instruction="...use the tools to read/modify the calendar...",
                    tools=await toolset.get_tools())

# 2. Declare identity: a skill + an Agent Card
skill = AgentSkill(id="check_availability", name="Check availability",
                   description="Check if the user is free in a time window",
                   tags=["calendar"], examples=["Am I free 10-11am tomorrow?"])
agent_card = AgentCard(name="Calendar Agent", url=f"http://{host}:{port}/",
                       version="1.0.0", defaultInputModes=["text"],
                       defaultOutputModes=["text"],
                       capabilities=AgentCapabilities(streaming=True),
                       skills=[skill])

# 3. Wire executor + task store, mount on Starlette, serve over HTTP
runner = Runner(app_name=agent_card.name, agent=adk_agent,
                artifact_service=InMemoryArtifactService(),
                session_service=InMemorySessionService(),
                memory_service=InMemoryMemoryService())
agent_executor = ADKAgentExecutor(runner, agent_card)
request_handler = DefaultRequestHandler(agent_executor, task_store=InMemoryTaskStore())
a2a_app = A2AStarletteApplication(agent_card=agent_card, http_handler=request_handler)
uvicorn.run(Starlette(routes=a2a_app.routes()), host=host, port=port)

Read it as three layers: capability (the LlmAgent + its tools), identity (the AgentCard declaring streaming and the check_availability skill), and service (a task store, an executor, and a Starlette/Uvicorn HTTP surface). That separation is exactly why a CrewAI agent and an ADK agent can talk: they agree on the card and the task lifecycle, and disagree freely about everything inside the box. The book notes the official samples cover LangGraph, CrewAI, Azure AI Foundry, and AG2; tools like Trickle AI visualize and trace A2A traffic for debugging and optimization — the cross-agent analogue of the trace inspection you'll build in lesson 21.

Running example — the product assistant delegates cohort analysis

Threading it all together. The product assistant (client agent) gets "why did Pro churn jump last month?"

01Discover. Query the registry for a card tagged cohort-analysis; read the data agent's card — it supports streaming and pushNotifications, auth is OAuth 2.0.

02Delegate. sendTask with parts = {goal, date range, segment="Pro"}, output_schema="analysis_report_v1", deadline 10 min, idempotency_key=task_id. Estimated 120 s ⇒ register a webhook rather than block.

03Clarify. Data agent enters input-required: "include trialists?" Client replies under the same contextId; state resumes.

04Receive + validate. Webhook fires completed; artifact carries the SQL, two charts, and caveats. Validate against the schema; merge into the trace with the data agent's audit-log IDs as provenance.

Checkpoint exercise

Try it

Write the Agent Card for your cohort-analysis data agent: name, url, version, capabilities (which of streaming/push do you support, and why?), authentication schemes, and one skill with an id, description, two example prompts, and tags. Then justify, using the §4 latency math, whether a 120 s job should be polled or pushed.

Failure modes

No clear owner on remote failure. The remote task dies and neither side is responsible for retry or cleanup. Fix: the task contract names the owner and the failure semantics up front.
Retrying non-idempotent work. A timed-out "generate report" gets retried and the customer is billed twice. Fix: an idempotency_key the server dedups on.
Trusting remote output blindly. The data agent's "churn +12%" becomes premise with no evidence. Fix: validate against output_schema; require evidence parts; merge provenance.
Holding a 2-minute synchronous connection. Proxies time it out; the result is lost. Fix: async + push/SSE for long jobs (§4).
Leaked credentials. API key in the URL lands in access logs. Fix: tokens in HTTP headers only; mTLS on the channel.

Implementation checklist

How does discovery work — well-known URI, registry, or direct config? Is the card endpoint secured?
What task states exist (submitted / working / input-required / completed / failed) and who handles each?
Can the caller cancel, and what happens to in-flight work?
Which interaction mechanism, and does the card actually declare it?
Are retries idempotent (idempotency_key)?
Is there a contextId threading multi-turn state?
How are traces joined across agents (audit-log IDs, contextId in provenance)?
Auth scheme stated in the card; credentials in headers; mTLS on the wire?

Where this points next

A2A gave the agent reach — it can now compose specialists across processes and organizations. But reach is expensive: every delegated task is more tokens, more latency, more dollars (recall the polling bill — 1,200 USD/day from one careless interval). Lesson 18, Resource-aware optimization, treats compute, time, money, context window, tools, and human attention as scarce budgets the agent must spend deliberately — which mechanism to choose, which model tier to route to, when delegation is worth its network cost. The latency/cost calculation you just did for poll-vs-push is the first instance of the general discipline that lesson formalizes.

Takeaway

A2A is Google's open, HTTP + JSON-RPC 2.0 standard for one agent to discover and delegate to another, even across frameworks (ADK, LangGraph, CrewAI). The Agent Card is the agent's digital identity — name, endpoint, version, declared capabilities, skills, I/O modes, and auth — found via well-known URI, a managed registry, or direct config. Work is an asynchronous task moving through submitted → working → input-required → completed/failed, threaded by contextId, returning a part-based artifact. Pick an interaction mechanism by the latency budget: sync for fast ops, SSE to stream progress, push (webhook) for long jobs, polling as the fallback — and the poll-vs-push math is real money. Security is built in: mTLS, OAuth/API-key in headers, audit logs that double as provenance. And never confuse the two protocols: MCP connects an agent to tools and resources; A2A connects an agent to another agent.

Interview prompts

What problem does A2A solve that calling a tool (or a plain REST API) does not? (§1 — a remote agent is a long-running, black-box, multi-turn, possibly-third-party peer; A2A standardizes capability discovery, task lifecycle, streaming, multi-turn input-required state, and auth so any compliant agents interoperate without bespoke integration.)
What is an Agent Card and what does it contain? (§2 — the remote agent's JSON identity: name, endpoint URL, version, declared capabilities (streaming/push), skills with examples/tags, default I/O modes, and authentication schemes; clients read it before sending any task.)
How does a client discover a remote agent? (§2 — well-known URI (/.well-known/agent.json) for open discovery, a managed registry for enterprise governance, or direct configuration for fixed/private peers; secure the card endpoint regardless.)
Walk the task lifecycle and explain input-required and contextId. (§3 — submitted → working → input-required → completed/failed; input-required lets the remote agent pause and ask the client for a missing parameter, and contextId threads multi-turn/related tasks so context survives.)
A delegated job takes ~2 minutes. Which interaction mechanism, and why not synchronous? (§4 — push (webhook) or SSE; sync blocks and times out at proxies; vs polling, push is one register + one callback instead of dozens of wasted round-trips — roughly a 30× cost cut at 2 s intervals — and learns of completion immediately.)
How is A2A different from MCP? (§6 — MCP connects an agent to tools/resources/context (passive servers you drive); A2A connects an agent to another autonomous agent for delegation and coordination via tasks; they are complementary and often used together.)
How do you keep a multi-agent answer trustworthy and traceable? (§5 — declare an output_schema and require evidence parts in the artifact, validate before merging, and join audit-log IDs / contextId into your trace so one agent's conclusion isn't silently accepted as another's premise.)