Part V - Reliability and control

Human-in-the-loop - approval, correction, escalation

Lesson 14 gave the agent a way to fail safely; sometimes the safe failure is to stop and ask a person. Human-in-the-loop (HITL) is the pattern that wires deliberate human judgment into the agent's control loop — not as a fallback for broken code, but as a first-class state the agent can enter whenever the cost of being wrong outweighs the cost of waiting. This lesson makes "ask a human" concrete: which actions trigger it, what packet the human sees, what state the loop holds while it waits, and the brutal arithmetic of how much human attention you can actually afford.

The plan

Five moves. (1) Place HITL precisely in the loop you have been building — it is a checkpoint state, not an exception. (2) Name the book's six kinds of human involvement (oversight, intervention/correction, feedback-for-learning, decision augmentation, collaboration, escalation) so you stop conflating them. (3) Design the decision packet — the single most important artifact, because a bad packet makes the human slower than the agent. (4) Do the scalability arithmetic: a worked example showing why you cannot put a human on every action, and how an autonomy threshold trades accuracy for throughput. (5) Build the Google ADK technical-support agent from the book, where an escalate_to_human tool is the HITL primitive. We close on the "human-on-the-loop" variant — humans set strategy, the agent executes — which hands directly into RAG and resource budgets.

Linear position

Prerequisite: Lesson 14 (Exception handling and recovery) — the agent can already pause, retry, and hold state across a recovery path. Lesson 13 (Goal setting and monitoring) — the agent has a measurable goal and a monitor that can flag low confidence.
New capability: A waiting_for_human checkpoint state with a typed decision packet, an escalation policy that decides which actions need human sign-off, a timeout/owner contract, and a place to store the human's decision as both control signal and future training/eval data.

1 · Where HITL sits in the loop

Every lesson in this track has added something to the same control loop: observe → decide → act → observe. Recovery (lesson 14) taught the loop to catch a thrown error and route to a repair path. HITL adds a different kind of branch: a deliberate pause where the loop suspends itself and emits a request for a human decision, then resumes, revises, or stops based on the reply.

The key reframing from the book: human involvement is not a single "approve / deny" button bolted onto the end. The agent should know which kind of human help it needs. The chapter enumerates six distinct modes, and they correspond to very different code:

Oversight

Human watches logs / a live dashboard; the agent runs autonomously but is observable. No pause — monitoring only.

Intervention & correction

On an error or ambiguous case the agent pauses; the operator fixes the error, supplies missing data, or redirects. The loop resumes from the corrected state.

Feedback for learning

Human preferences are collected and fed back to improve the model — the book's example is RLHF. The decision is stored as training data, not just a control signal.

Decision augmentation

The agent produces analysis and a recommendation; the human makes the final call. AI sharpens the decision rather than replacing it.

Human-AI collaboration

Each side plays to its strength — the agent handles routine data work, the human handles creative or high-stakes negotiation.

Escalation

When a task exceeds the agent's competence, a defined protocol hands it to a human operator before a mistake is made.

The first lesson of HITL is therefore to name the mode at design time. "Add a human in the loop" is not an architecture; "on a refund over $500, the agent produces a recommendation and the support lead approves (decision augmentation), with a 4-hour timeout that escalates to the on-call manager" is.

agent loop: observe ──▶ decide ──▶ [risk gate] ──▶ act ──▶ observe ──▶ ... │ high risk / low conf / policy ▼ ┌─────────────────────────────┐ │ state.status = │ │ "waiting_for_human" │ ← loop SUSPENDS, persists state │ emit DECISION PACKET │ └─────────────────────────────┘ │ ┌─────────────────────┼─────────────────────┐ approve revise / correct deny / timeout │ │ │ ▼ ▼ ▼ resume + act resume from edited state abort + log + rollback

2 · The risk gate — which actions need a human

The agent cannot ask about everything (section 4 shows why with numbers), so the design begins with a risk gate: a predicate over the proposed action that decides whether it passes straight through or diverts to the human. The book's rule of thumb is that HITL belongs where errors carry significant safety, ethical, or financial consequences — medicine, finance, autonomous systems — and where the task is inherently ambiguous (content moderation, complex support escalations).

In our running coding/research assistant, the gate is concrete. The agent reads files and runs tests with full autonomy because those actions are read-mostly and cheaply reversible. It must stop and ask before any action that is irreversible or has external side effects:

Action	Reversible?	Side effect	Gate decision
read file, grep, run unit tests	yes	none	autonomous
edit a file in the working tree	yes (git)	local	autonomous
install a package / change a lockfile	hard	environment	approve
delete a directory	no	data loss	approve
git push / open a PR	hard	shared repo	approve
edit production config	no	live system	approve

Three signals drive a good gate, often combined: a static permission policy (this class of action always needs sign-off), a dynamic confidence / monitor signal from lesson 13 (the agent itself is uncertain), and a blast-radius estimate (how many users / how much money / how irreversible). A mature gate is a small scoring function, not a hardcoded list.

3 · The decision packet — the artifact that makes or breaks HITL

When the gate diverts, the agent must hand the human a decision packet. This is the single most important design surface in the pattern, and the book is emphatic about why: the human should not have to reconstruct the situation from raw chat logs. A good packet lets the reviewer decide in seconds; a bad one makes the human slower and less accurate than the agent would have been alone — which destroys the entire economic case for HITL.

The packet should carry: the goal, the proposed action, the evidence that led to it, the risk and the exact consequence of approval, viable alternatives, a rollback plan, and a deadline with an owner.

decision_packet = {
  "goal":      "make the failing test suite green",
  "action":    "delete ./.cache and ./node_modules, then reinstall",
  "evidence":  "tests fail with stale-module errors; lockfile changed in last commit",
  "risk":      "irreversible local delete; ~3 min reinstall; no remote impact",
  "consequence": "removes 412 MB; rebuilds from package-lock.json (pinned)",
  "alternatives": ["clear cache only (keep node_modules)", "skip and report flaky tests"],
  "rollback":  "none for delete; reinstall is deterministic from lockfile",
  "question":  "Approve full clean reinstall?",
  "deadline":  "30 min, then auto-deny + notify @owner"
}
state.status = "waiting_for_human"
state.checkpoint = serialize(loop_state)   # so we can resume exactly

Trap: the privacy and anonymization cost

The book flags an easy-to-miss obligation: a packet shown to a human operator may expose sensitive data. In regulated settings the packet must be anonymized before it reaches the reviewer — which adds real pipeline complexity. Your packet builder is not just a formatter; it is a redaction boundary. Treat PII in the evidence field as a first-class concern, not an afterthought.

4 · The scalability arithmetic — why you cannot ask a human everything

The book names HITL's chief drawback bluntly: it does not scale. A human operator can guarantee accuracy but cannot personally review millions of tasks, so production systems blend automation with HITL — autonomy for the easy mass, humans for the risky tail. To design that blend you have to do the arithmetic, so let us do it.

Worked example — a moderation-style agent

Suppose the agent handles 1,000,000 items/day. A human reviewer can carefully judge 1 item every 30 seconds, works a 6-hour effective day, so reviews 6 × 3600 / 30 = 720 items/day. If a reviewer is fully loaded, you would need 1{,}000{,}000 / 720 ≈ 1{,}389 reviewers to put a human on every item — obviously impossible.

Now add an autonomy threshold: the agent auto-handles any item whose confidence exceeds a cutoff, and escalates the rest. Say 92% of items clear the cutoff and are handled autonomously. Then only 0.08 × 1{,}000{,}000 = 80{,}000 items escalate, needing 80{,}000 / 720 ≈ 111 reviewers. Raise the cutoff so 98% clear it and escalations drop to 20{,}000 items → 28 reviewers.

But the cutoff trades two things against each other, and that is the heart of the pattern. Raise it and you save human attention but let through more agent mistakes on the borderline cases; lower it and accuracy climbs but the human queue — and latency — explode. Suppose the agent's error rate on items below the cutoff is 15% and it would have been right on items above. With 80,000 escalated (caught by humans) the residual error is near zero on those, but with the 98%-cutoff you escalate fewer borderline items, so more agent errors slip into the auto-handled stream. The right cutoff is the point where the marginal cost of one more reviewer equals the marginal cost of one more escaped error — exactly the kind of resource trade-off lesson 18 formalizes.

Play with the slider below to feel this. It is the one place in this pattern where a numeric widget genuinely builds intuition: the autonomy threshold is a dial, and you need to see throughput, reviewer headcount, and escaped errors move together as you turn it.

Autonomy threshold — trade reviewer headcount against escaped errors

Volume is fixed at 1,000,000 items/day; each reviewer judges 720 items/day. Raise the autonomy threshold and the agent auto-handles more items (fewer escalations, fewer reviewers) but lets more borderline mistakes through. The curve is illustrative — confidence is modeled so that a higher threshold escalates fewer items but the auto-handled stream carries a higher residual error rate.

autonomy threshold: 0.92

Auto-handled

920,000

Escalated / day

80,000

Reviewers needed

111

Escaped errors / day

Show the core JS

const VOL = 1_000_000, PER_REVIEWER = 720;
// fraction auto-handled grows with the threshold (more items clear a higher cutoff to avoid the human)
const autoFrac = Math.min(0.999, t);                 // illustrative mapping
const escalated = Math.round(VOL * (1 - autoFrac));
const reviewers = Math.ceil(escalated / PER_REVIEWER);
// residual error in the AUTO stream rises as we trust the agent on more borderline items
const autoErrRate = 0.002 + 0.12 * Math.pow(t, 6);   // small at low t, climbs near 1
const escapedErrors = Math.round(VOL * autoFrac * autoErrRate);

5 · The book's code — escalation as a tool (Google ADK)

The book demonstrates HITL with a Google ADK technical-support agent, and the design choice is worth internalizing: escalation is exposed to the model as just another tool. The agent has three tools — troubleshoot_issue, create_ticket, and escalate_to_human — and its instruction tells it when to reach for each. The HITL boundary lives in the tool list and the prompt, not in special framework machinery.

from google.adk.agents import Agent

def troubleshoot_issue(issue: str) -> dict:
    return {"status": "success", "report": f"troubleshooting steps for: {issue}"}

def create_ticket(issue_type: str, details: str) -> dict:
    return {"status": "success", "ticket_id": "TICKET123"}

def escalate_to_human(issue_type: str) -> dict:
    # in a real system this routes into a human queue
    return {"status": "success", "message": f"{issue_type} escalated to a human specialist."}

technical_support_agent = Agent(
    name="technical_support_specialist",
    model="gemini-2.0-flash-exp",
    instruction="""
You are a technical support specialist for an electronics company.
First, check state["customer_info"]["support_history"]; if present, reference it.
Standard flow:
  1. Use troubleshoot_issue to analyze the problem.
  2. Walk the user through the basic steps.
  3. If unresolved, use create_ticket to log it.
When a problem is COMPLEX and exceeds basic troubleshooting:
  1. Use escalate_to_human to hand off to a human specialist.
Stay professional and empathetic.
""",
    tools=[troubleshoot_issue, create_ticket, escalate_to_human],
)

Two details from the book are easy to skip but central. First, escalate_to_human is described as the core of the HITL design — the explicit, named exit to a person — and the prompt defines the condition ("complex, exceeds basic troubleshooting") that fires it. The competence boundary is encoded, not implicit. Second, the book pairs this with a personalization callback that runs before the model is called and injects the customer's name, tier, and recent purchases as a system message read from state["customer_info"]:

def personalization_callback(callback_context, llm_request):
    info = callback_context.state.get("customer_info")
    if info:
        note = (f"\nIMPORTANT personalization:\n"
                f"name: {info.get('name', 'valued customer')}\n"
                f"tier: {info.get('tier', 'standard')}\n")
        if info.get("recent_purchases"):
            note += f"recent purchases: {', '.join(info['recent_purchases'])}\n"
        # insert as a system message before the first content
        llm_request.contents.insert(0, system_message(note))
    return None  # continue with the modified request

The pairing matters: HITL is not only the escalation exit, it is also feeding the human (and the model that decides when to involve the human) the right context to judge well. A tier-1 enterprise customer with a recent purchase is a different escalation decision than an anonymous one. The book notes other frameworks (e.g. LangChain) offer equivalent escalation tooling — the pattern is framework-agnostic; the primitive is "a named, observable hand-off to a person."

6 · Human-on-the-loop — the strategic variant

The book closes with a useful variant: Human-on-the-loop. Here the human is not in the per-action path at all; the human expert sets the policy and the agent executes within it in real time. The book's example is an automated trading system: a human sets rules like "hold 70% tech / 30% bonds, never more than 5% in a single company, auto-sell any stock that drops 10% below purchase," and the AI monitors the market and executes trades instantly under those constraints. AI does the fast execution; humans do the slow strategy. A modern call center is the same shape: a manager sets policy ("route any caller who mentions an outage straight to technical support; offer a human agent when sentiment is highly frustrated") and the agent applies it across thousands of live interactions without per-call sign-off.

The distinction is the lever for the scalability problem in section 4: in-the-loop gates individual actions (accurate, slow, expensive); on-the-loop gates the policy (scalable, requires you to encode judgment up front). Most real systems are a blend — on-the-loop for the routine mass, in-the-loop for the risky tail.

Running example — the coding/research assistant, end to end

Threaded together: the assistant works autonomously through reads, edits, and test runs (oversight via a trace dashboard). When it proposes git push, the risk gate fires, the loop sets status = "waiting_for_human", and it emits a decision packet — goal, the diff summary as evidence, "pushes to shared main" as the consequence, "open a PR instead" as an alternative, and a 30-minute deadline. On approval it resumes from the serialized checkpoint and pushes; on timeout it auto-denies, opens a PR instead, and logs the human's eventual decision as an eval case (lesson 21) — "should an approved push to main have been a PR?" — so the gate's policy can be tuned from real decisions.

Failure modes

Vague packets. Escalating questions that force the human to reconstruct the trace from raw logs — the human becomes the bottleneck and the slowdown, killing HITL's value.
No timeout or owner. A pending approval with no deadline and no assigned reviewer silently stalls the task forever.
Human as a crutch. Routing easy cases to people to paper over weak automation — drives reviewer cost up with no accuracy gain (section 4's arithmetic in reverse).
Reviewer expertise mismatch. The book's caution: AI can write code, but only a skilled developer can spot a subtle bug and guide the fix. An under-trained reviewer rubber-stamps.
PII leakage. Sending un-anonymized sensitive data into the human queue.
Lost decisions. The human's answer is used once and discarded instead of stored as control state + training/eval data.

Implementation checklist

Which mode of involvement is this — oversight, correction, augmentation, or escalation?
What is the risk gate (policy class + confidence + blast radius)?
What exactly does the human see in the decision packet, and is it anonymized?
What happens on timeout — auto-deny, auto-approve safe default, or re-route?
Who owns the pending decision?
Is loop state serialized so the task resumes exactly on approval?
Is the human decision stored as both control signal and eval/training data?
Have you done the headcount arithmetic for your volume and escalation rate?

Where this points next

HITL solved the judgment gap — when to defer to a person. But many cases the agent escalates are not judgment calls at all; they are knowledge gaps, where the agent simply does not have the facts and a human would just look them up. Lesson 16 (RAG and agentic retrieval) closes that gap: instead of escalating "I don't know," the agent retrieves grounded context and answers on its own — shrinking the escalation tail you just learned to size. And the cost arithmetic from section 4 — human attention as a scarce, priced resource traded against accuracy and latency — is exactly the budgeting discipline that lesson 18 (Resource-aware optimization) generalizes across compute, money, and time.

Takeaway

Human-in-the-loop wires deliberate human judgment into the agent's control loop as a first-class waiting_for_human checkpoint, not an exception. Design it by (1) naming the mode — oversight, intervention/correction, feedback-for-learning (RLHF), decision augmentation, collaboration, or escalation; (2) defining a risk gate from policy class, confidence, and blast radius; (3) handing the human a concise, anonymized decision packet (goal, action, evidence, risk, consequence, alternatives, rollback, deadline) so they decide in seconds; and (4) respecting the scalability ceiling — you cannot review every action, so an autonomy threshold trades reviewer headcount against escaped errors. The book's Google ADK example makes the primitive concrete: escalate_to_human is just a named tool the agent reaches for when a task exceeds its competence, and the human-on-the-loop variant flips the lever — humans set strategy, the agent executes fast within it. Store every human decision as both a control signal and future eval/training data.

Interview prompts

HITL is "more than an approve button." Name the distinct modes and why they differ in code. (§1 — oversight (monitor only), intervention/correction (resume from fixed state), feedback-for-learning (store as RLHF data), decision augmentation (human decides on agent's recommendation), collaboration, and escalation (hand off before a mistake); each implies different state handling and storage.)
What goes in a decision packet, and why does a bad one defeat the purpose of HITL? (§3 — goal, action, evidence, risk, exact consequence, alternatives, rollback, deadline; a vague packet forces the human to reconstruct the trace, making them slower and less accurate than the agent, destroying the economic case.)
You handle 1M items/day; a reviewer does 720/day. Why can't you put a human on every item, and how do you bound headcount? (§4 — full coverage needs ~1,389 reviewers, infeasible; an autonomy threshold auto-handles high-confidence items so only the escalated tail (e.g. 8% → 80k → ~111 reviewers) reaches humans.)
What does raising the autonomy threshold trade off? (§4 — fewer escalations and reviewers, but more borderline agent errors slip into the auto-handled stream; optimal cutoff is where marginal reviewer cost equals marginal escaped-error cost.)
In the book's ADK example, how is escalation implemented and why is that elegant? (§5 — as a tool, escalate_to_human, that the model invokes when the prompt's "complex / exceeds basic troubleshooting" condition holds; the HITL boundary lives in the tool list + instruction, framework-agnostic.)
Distinguish human-in-the-loop from human-on-the-loop. (§6 — in-the-loop gates individual actions (accurate, slow, doesn't scale); on-the-loop gates the policy/strategy and lets the agent execute fast within it (scalable, judgment encoded up front), e.g. the 70/30 portfolio rules.)
The book flags privacy as a HITL cost. Where does it bite? (§3 — packets shown to operators may expose PII; the packet builder must anonymize sensitive evidence before it reaches a human, adding real pipeline complexity.)