all_lessons/agentic_systems/17 · a2alesson 17 / 25

Part VI - Knowledge, communication, and optimization

A2A - agent-to-agent communication

For sixteen lessons everything lived inside one process: one control loop, one memory, tools the agent could call directly. Real systems are not one process. The cohort-analysis specialist runs on another team's stack, the report writer is a CrewAI agent, the calendar checker is a Google ADK service on its own port. To compose them, an agent must be able to find a peer it has never met, hand it a task, and trust the result. That is what the Agent-to-Agent (A2A) protocol standardizes, and it is the moment our agent stops being a program and becomes a node in a network.

Book source
Chapter 15 — Agent-to-Agent Communication (智能体间通信 / A2A); PDF outline pages 164–174. Google's open A2A standard, the Agent Card, discovery, task lifecycle, the four interaction mechanisms, security, the A2A-vs-MCP contrast, and the ADK calendar-agent worked example.
The plan
Five moves. (1) Pin down why a remote agent is different from a local tool — and why "just call its REST API" is not enough. (2) Meet the four actors and the Agent Card, the JSON identity that makes a stranger discoverable. (3) Walk the task lifecycle: submit → working → input-required → completed/failed, with contextId threading multi-turn state. (4) Choose among the four interaction mechanisms (sync, async polling, SSE streaming, push webhook) with a worked latency/cost calculation and an interactive widget. (5) Lock down identity, security, and provenance, then contrast A2A with MCP so you never confuse the two protocols again.
Linear position
Prerequisite: lesson 09 (multi-agent collaboration — roles, handoffs, synthesis inside one runtime), lesson 12 (MCP — the integration contract for tools and resources), and lesson 16 (RAG — agents grounded in shared evidence).
New capability: networked task delegation across process, framework, and organizational boundaries — discovering a remote agent you did not write, sending it a typed task, and merging its result into your trace with provenance intact.

1 · Why a remote agent is not just another tool

In lesson 07 a tool was a function the agent called and got an answer from in one shot: get_weather("Paris") → "18°C". In lesson 09 multiple agents collaborated, but they shared one runtime, one memory store, one trace. The book's key observation in Chapter 15 is that a single agent — however capable — hits a ceiling on complex, multi-layered problems, and the way past that ceiling is to let agents built on different frameworks collaborate: a LangGraph planner, a CrewAI writer, a Google ADK calendar service. The instant a collaborator lives in another process you inherit every hard problem of distributed systems at once:

"Just call its REST API" fails because every pair of agents would then invent its own ad-hoc shape for capability description, task IDs, streaming, multi-turn state, and auth — the integration cost the book calls out as the core problem (high cost, long cycles, siloed agents). A2A is the open, HTTP-based standard that fixes the shape once so any compliant agent can talk to any other. It is backed by a broad set of vendors — Atlassian, Box, LangChain, MongoDB, Salesforce, SAP, ServiceNow, with Microsoft integrating it into Azure AI Foundry and Copilot Studio — which is precisely what makes "an agent I have never met" a tractable thing to call.

Mental model
A2A is to agents what HTTP + a business card is to companies. The Agent Card is the business card (who I am, what I can do, how to reach me, how to authenticate). The task is the contract you sign (here is the job, the inputs, the deadline, the artifact I expect back). The protocol is the postal system that carries signed messages reliably and logs every one of them.

2 · The four actors and the Agent Card

A2A names four entities. Hold them straight and the rest of the protocol falls out:

User
The human (or upstream system) that initiates the need for help.
A2A client (client agent)
The app or agent acting on the user's behalf, requesting an action or information.
A2A server (remote agent)
An agent exposing an HTTP endpoint that handles requests and returns results. A black box — the client never sees its internals.
Agent Card
The remote agent's digital identity: a JSON document advertising name, endpoint, version, capabilities, skills, I/O modes, and auth.

The Agent Card is the keystone. It is what makes a stranger usable: before the client sends a single task it reads the card to learn what the agent can do and how to talk to it. Here is the book's WeatherBot card, trimmed to its load-bearing fields:

agent_card = {
  "name": "WeatherBot",
  "description": "Accurate weather forecasts and historical data.",
  "url": "http://weather-service.example.com/a2a",   # the endpoint
  "version": "1.0.0",
  "capabilities": {                # what interaction styles it supports
    "streaming": True,             # can push SSE incremental results
    "pushNotifications": False,    # cannot call back a webhook
    "stateTransitionHistory": True
  },
  "authentication": { "schemes": ["apiKey"] },
  "defaultInputModes":  ["text"],
  "defaultOutputModes": ["text"],
  "skills": [
    { "id": "get_current_weather", "name": "Get current weather",
      "description": "Real-time weather for any location.",
      "examples": ["What's the weather in Paris now?"],
      "tags": ["weather", "current", "real-time"] },
    { "id": "get_forecast", "name": "Get forecast",
      "description": "5-day weather prediction.",
      "examples": ["Will it rain in London this weekend?"],
      "tags": ["weather", "forecast", "prediction"] }
  ]
}

Two design choices are worth dwelling on. First, capabilities are declared: the card tells the client up front whether it may subscribe to a stream or register a webhook, so the client picks an interaction style it knows will work. Second, skills carry examples and tags — these are not decoration. A planner agent uses them to match a sub-goal ("I need 5-day weather") to a capability (get_forecast) the same way a developer reads a function's docstring before calling it.

How a client finds the card (discovery)

The book lists three discovery mechanisms, trading openness against control:

AWell-known URI. The agent hosts its card at a standard path such as /.well-known/agent.json. Public, zero-coordination discovery — point a client at a domain and it can self-configure. Best for open ecosystems.
BManaged registry. A central catalog where agents publish cards and clients query by capability/tag, with access control. Best for an enterprise with many internal agents and a governance requirement.
CDirect configuration. The card is embedded or shared privately — no dynamic lookup. Best for tightly-coupled or private systems where the peers are fixed.

Whichever you use, the card endpoint itself must be secured (access control, mTLS, or network restriction), because even without secrets a card leaks your capability surface.

3 · The task lifecycle — submit, work, ask, finish

A2A communication is organized around the task: the fundamental unit of work for a long-running process. Crucially, tasks are asynchronous — designed for operations that may take real time — and each one carries a unique ID and moves through a small set of states. Agents exchange messages (metadata like priority/creation-time plus one or more content parts: text, files, or structured JSON), and the actual output the remote agent produces is an artifact (also part-based, streamable). All A2A traffic is HTTP(S) carrying JSON-RPC 2.0, and to keep context across multiple related tasks the server issues a contextId that ties them together.

CLIENT AGENT REMOTE AGENT (black box) ───────────── ──────────────────────── read Agent Card ───────discovery───────▶ /.well-known/agent.json │ sendTask(task-001) ──────JSON-RPC───────▶ state: submitted │ │ │ ◀──── "working" + taskId ──────────────┤ (long job starts) │ │ │ ◀──── "input-required: date range?" ───┤ (needs more info) │ │ reply(parts=[...]) ───same contextId────▶ │ (multi-turn, context kept) │ │ │ ◀──── artifact: report_v1 ─────────────┤ state: completed │ │ (or: failed + reason) validate(artifact, output_schema) merge into trace with provenance

The input-required state is what separates A2A from a dumb RPC. The remote agent can pause, ask the client for a missing parameter, and resume — all under one contextId, so neither side loses the thread. This is the networked analogue of the human-in-the-loop pause you built in lesson 15, except the "human" asking the clarifying question is another agent.

4 · Choosing an interaction mechanism — a latency budget

The card declares which styles an agent supports; the client picks the cheapest one that fits the job. A2A offers four, and the choice is a real engineering decision driven by how long the task runs and how fresh the client needs intermediate results:

MechanismMethodConnectionBest when
Sync request/responsesendTask / tasks/sendone request, blocks for the full answerfast ops (sub-second), client wants the whole answer at once
Async pollingsendTask → returns working + taskId; client pollsmany short requestslong jobs, client checks status on an interval
Streaming (SSE)sendTaskSubscribe / tasks/sendSubscribeone persistent server→client streamreal-time incremental results (tokens, progress)
Push (webhook)client registers a webhook URLserver calls client back on changevery long / resource-heavy jobs; client shouldn't hold a connection

The two long-job options — polling and push — are not interchangeable. Polling is simple but wasteful: every poll is a round trip that usually returns "still working." Let us make the cost concrete.

Worked example — poll vs push for the cohort-analysis task
Our product assistant delegates a cohort analysis that takes the remote data agent T = 120 s. Each status poll is an HTTP round trip costing ~40 ms of wall time and, say, 0.002 USD in egress + handler invocation on a busy fleet.

Poll every 2 s: 120 / 2 = 60 polls. 59 of them return "working" — pure waste. Cost ≈ 60 × 0.002 = 0.12 USD per delegated task, and the client learns of completion up to 2 s late (average ~1 s).

Poll every 0.5 s for fresher status: 240 polls ≈ 0.48 USD and ≤0.5 s staleness — 4× the cost to shave 1.5 s.

Push (webhook): 1 register call + 1 callback = 2 messages ≈ 0.004 USD, completion known within network latency (~tens of ms), and the client holds no connection for 2 minutes. At 10,000 delegated analyses/day that is 0.12 × 10,000 = 1,200 USD/day on 2 s polling versus 40 USD/day on push — a 30× reduction. Streaming (SSE) sits between: one held connection, immediate increments, ideal when you actually want to show progress rather than just learn of completion.

Rule of thumb that drops out of the math: sub-second job → sync; you need to render progressive output → SSE; minutes-long fire-and-forget → push; push unavailable (card says pushNotifications: false) → fall back to polling at the slowest interval your staleness budget tolerates. The widget below lets you feel that trade-off directly.

Interaction-mechanism cost — push vs poll for a long task
A remote task runs for T seconds. Polling sends one status round-trip every interval seconds; most return "working." Push registers once and gets one callback. The bars compare wasted round-trips and cost. Watch how completion staleness (how late you learn it finished) trades against polling cost — and how push collapses both.
Polls sent
60
Wasted polls
59
Poll cost
$0.120
Push cost
$0.004
Avg staleness
1.0 s
Push saves
30×
Show the core JS
const polls   = Math.ceil(T / interval);   // one round-trip per interval
const wasted  = Math.max(0, polls - 1);    // all but the final poll say "working"
const pollCost = polls * costPerMsg;
const pushCost = 2 * costPerMsg;           // 1 register + 1 callback
const avgStale = interval / 2;             // expected delay learning of completion
const factor   = pollCost / pushCost;      // how many × cheaper push is

The book gives the matching JSON-RPC shapes. A sync request uses sendTask and expects one complete answer; a streaming request uses sendTaskSubscribe to open a persistent connection over which the agent returns increments:

sync_request = {                       streaming_request = {
  "jsonrpc": "2.0", "id": "1",           "jsonrpc": "2.0", "id": "2",
  "method": "sendTask",                  "method": "sendTaskSubscribe",   # SSE
  "params": {                            "params": {
    "id": "task-001",                      "id": "task-002",
    "sessionId": "session-001",            "sessionId": "session-001",
    "message": { "role": "user",           "message": { "role": "user",
      "parts": [{ "type": "text",            "parts": [{ "type": "text",
        "text": "USD to EUR rate?" }] },       "text": "JPY to GBP today?" }] },
    "acceptedOutputModes": ["text/plain"], "acceptedOutputModes": ["text/plain"],
    "historyLength": 5 } }                 "historyLength": 5 } }

5 · Identity, security, and provenance

Because a remote agent may belong to another org, A2A treats security as part of the architecture, not an add-on. The book lists four mechanisms, and each one maps to a failure it prevents:

Provenance is the piece that ties back to lesson 16. RAG taught us that an unsupported claim is a liability; A2A makes that worse, because one agent's conclusion silently becomes another agent's premise. If the data agent returns "churn is up 12%" with no queries, no caveats, no source, the product assistant will repeat it as fact. The defense is the same contract the book pushes: the delegated task declares an output_schema, and the artifact must carry evidence (the queries it ran, the charts, the caveats) that the caller validates before merging. Validation plus the joined audit log is how a multi-agent answer stays traceable.

6 · A2A vs MCP — two protocols, two jobs

This is the single most-tested distinction in the chapter, and it is easy to get backwards. The book states it cleanly: the two are complementary.

MCP (lesson 12)A2A (this lesson)
Connectsan agent ⟷ tools, data, resourcesan agent ⟷ another agent
Standardizesstructured access to context and toolscoordination, delegation, communication
The other party isa tool/resource server (passive, you drive it)an autonomous agent (it reasons, may ask back)
Unit of interactiona tool call / resource reada task with a lifecycle and artifacts
One-linerhow an agent reaches into the worldhow an agent talks to a peer

In our running system the product assistant uses MCP to read the warehouse schema and call a SQL tool, and uses A2A to delegate the whole cohort-analysis job to a specialized data agent that itself uses MCP internally. Same system, both protocols, no overlap: MCP is the tool plug, A2A is the agent-to-agent phone line.

The ADK worked example — standing up an A2A server

The book's concrete code builds a Google ADK "Calendar Agent" and exposes it over A2A. The shape generalizes to any remote agent you would delegate to:

# 1. Build the agent (ADK LlmAgent over a tool)
async def create_agent(client_id, client_secret) -> LlmAgent:
    toolset = CalendarToolset(client_id=client_id, client_secret=client_secret)
    return LlmAgent(model="gemini-2.0-flash-001", name="calendar_agent",
                    description="Helps manage the user's calendar.",
                    instruction="...use the tools to read/modify the calendar...",
                    tools=await toolset.get_tools())

# 2. Declare identity: a skill + an Agent Card
skill = AgentSkill(id="check_availability", name="Check availability",
                   description="Check if the user is free in a time window",
                   tags=["calendar"], examples=["Am I free 10-11am tomorrow?"])
agent_card = AgentCard(name="Calendar Agent", url=f"http://{host}:{port}/",
                       version="1.0.0", defaultInputModes=["text"],
                       defaultOutputModes=["text"],
                       capabilities=AgentCapabilities(streaming=True),
                       skills=[skill])

# 3. Wire executor + task store, mount on Starlette, serve over HTTP
runner = Runner(app_name=agent_card.name, agent=adk_agent,
                artifact_service=InMemoryArtifactService(),
                session_service=InMemorySessionService(),
                memory_service=InMemoryMemoryService())
agent_executor = ADKAgentExecutor(runner, agent_card)
request_handler = DefaultRequestHandler(agent_executor, task_store=InMemoryTaskStore())
a2a_app = A2AStarletteApplication(agent_card=agent_card, http_handler=request_handler)
uvicorn.run(Starlette(routes=a2a_app.routes()), host=host, port=port)

Read it as three layers: capability (the LlmAgent + its tools), identity (the AgentCard declaring streaming and the check_availability skill), and service (a task store, an executor, and a Starlette/Uvicorn HTTP surface). That separation is exactly why a CrewAI agent and an ADK agent can talk: they agree on the card and the task lifecycle, and disagree freely about everything inside the box. The book notes the official samples cover LangGraph, CrewAI, Azure AI Foundry, and AG2; tools like Trickle AI visualize and trace A2A traffic for debugging and optimization — the cross-agent analogue of the trace inspection you'll build in lesson 21.

Running example — the product assistant delegates cohort analysis

Threading it all together. The product assistant (client agent) gets "why did Pro churn jump last month?"

01Discover. Query the registry for a card tagged cohort-analysis; read the data agent's card — it supports streaming and pushNotifications, auth is OAuth 2.0.
02Delegate. sendTask with parts = {goal, date range, segment="Pro"}, output_schema="analysis_report_v1", deadline 10 min, idempotency_key=task_id. Estimated 120 s ⇒ register a webhook rather than block.
03Clarify. Data agent enters input-required: "include trialists?" Client replies under the same contextId; state resumes.
04Receive + validate. Webhook fires completed; artifact carries the SQL, two charts, and caveats. Validate against the schema; merge into the trace with the data agent's audit-log IDs as provenance.

Checkpoint exercise

Try it
Write the Agent Card for your cohort-analysis data agent: name, url, version, capabilities (which of streaming/push do you support, and why?), authentication schemes, and one skill with an id, description, two example prompts, and tags. Then justify, using the §4 latency math, whether a 120 s job should be polled or pushed.

Failure modes

  • No clear owner on remote failure. The remote task dies and neither side is responsible for retry or cleanup. Fix: the task contract names the owner and the failure semantics up front.
  • Retrying non-idempotent work. A timed-out "generate report" gets retried and the customer is billed twice. Fix: an idempotency_key the server dedups on.
  • Trusting remote output blindly. The data agent's "churn +12%" becomes premise with no evidence. Fix: validate against output_schema; require evidence parts; merge provenance.
  • Holding a 2-minute synchronous connection. Proxies time it out; the result is lost. Fix: async + push/SSE for long jobs (§4).
  • Leaked credentials. API key in the URL lands in access logs. Fix: tokens in HTTP headers only; mTLS on the channel.

Implementation checklist

  • How does discovery work — well-known URI, registry, or direct config? Is the card endpoint secured?
  • What task states exist (submitted / working / input-required / completed / failed) and who handles each?
  • Can the caller cancel, and what happens to in-flight work?
  • Which interaction mechanism, and does the card actually declare it?
  • Are retries idempotent (idempotency_key)?
  • Is there a contextId threading multi-turn state?
  • How are traces joined across agents (audit-log IDs, contextId in provenance)?
  • Auth scheme stated in the card; credentials in headers; mTLS on the wire?

Where this points next

A2A gave the agent reach — it can now compose specialists across processes and organizations. But reach is expensive: every delegated task is more tokens, more latency, more dollars (recall the polling bill — 1,200 USD/day from one careless interval). Lesson 18, Resource-aware optimization, treats compute, time, money, context window, tools, and human attention as scarce budgets the agent must spend deliberately — which mechanism to choose, which model tier to route to, when delegation is worth its network cost. The latency/cost calculation you just did for poll-vs-push is the first instance of the general discipline that lesson formalizes.

Takeaway
A2A is Google's open, HTTP + JSON-RPC 2.0 standard for one agent to discover and delegate to another, even across frameworks (ADK, LangGraph, CrewAI). The Agent Card is the agent's digital identity — name, endpoint, version, declared capabilities, skills, I/O modes, and auth — found via well-known URI, a managed registry, or direct config. Work is an asynchronous task moving through submitted → working → input-required → completed/failed, threaded by contextId, returning a part-based artifact. Pick an interaction mechanism by the latency budget: sync for fast ops, SSE to stream progress, push (webhook) for long jobs, polling as the fallback — and the poll-vs-push math is real money. Security is built in: mTLS, OAuth/API-key in headers, audit logs that double as provenance. And never confuse the two protocols: MCP connects an agent to tools and resources; A2A connects an agent to another agent.

Interview prompts