all_lessons/agentic_systems/22 · prioritizationlesson 22 / 25

Part VIII - Autonomy and discovery

Prioritization - choosing the next best action

An agent that can plan, act, recover, and evaluate now faces a harder question than can I do this? — it faces which of the many things I could do should I do next? Prioritization is the policy that keeps an autonomous loop pointed at the most valuable action when the to-do list is longer than the budget. This is the pattern that separates a goal-seeking agent from a script that runs its steps in the order they were written.

Book source
Chapter 20 - Prioritization (优先级排序); PDF outline pages 220-227. The chapter's worked example is a LangChain "project manager" agent that creates tasks, assigns P0/P1/P2 priorities, and routes them to workers.
The plan
Five moves. (1) Pin down why an autonomous loop needs a priority policy at all — the difference between a fixed plan and a live queue. (2) Name the book's six scoring criteria — urgency, importance, dependency, resource availability, cost/benefit, user preference — and turn them into one explicit scoring function. (3) Walk a worked numeric example on the coding/research agent: five candidate actions, real scores, a winner, then a single observation that re-ranks the queue. (4) Distinguish the book's three levels of prioritization — goal, plan-step, action — and where each lives in the loop. (5) Ground it in the book's LangChain project-manager agent, then hand off to exploration. Along the way: an interactive scorer where you move the weights and watch the ranking flip.
Linear position
Prerequisite: Goal setting and monitoring (lesson 13) gives you the objective and progress signal; exception handling (lesson 14) gives failure observations; resource-aware optimization (lesson 18) gives the cost/latency/token budgets; evaluation and monitoring (lesson 21) gives the quality signals. Prioritization consumes all of these as inputs.
New capability: A next-action policy — a scoring function over candidate actions, plus a reprioritization rule that re-runs the scorer whenever an observation changes the state.

1 · Why a loop needs a priority policy

Up to this lesson, the agent's control flow has been mostly determined. Prompt chaining (lesson 03) runs steps in a fixed line. Routing (lesson 04) picks one branch from a known set. Planning (lesson 08) produces an ordered list and the loop walks it. In all three, the order of work is decided before the agent starts acting, and the agent's job is to execute that order.

That breaks the moment two things are true at once, which is the normal case for a real autonomous agent: there are more useful actions available than the budget allows, and the world changes while the agent works. A coding agent mid-task could read three more files, run the test suite, ask the user a clarifying question, refactor a helper, or open a related issue — all defensible, none free. A plan written ten steps ago does not know that the last test just failed, or that the token budget is two-thirds gone, or that a dependency is now blocking everything downstream. The book's framing: without an explicit next-action decision process, an agent becomes inefficient, stalls, or fails to reach its key goals.

Prioritization is the answer. Instead of trusting a frozen plan, the agent keeps the plan as a candidate set and, at each step, asks: of everything I could do right now, which action has the best expected value per unit of cost, given the current state? The mental model is a hospital triage nurse. Patients (candidate actions) arrive faster than doctors (budget) can see them. The nurse does not treat in arrival order; she scores each by severity and time-sensitivity, and — critically — re-triages when a waiting patient deteriorates. The plan is not a queue you read top to bottom; it is a queue you re-sort after every new observation.

The distinction the book stresses
This dynamic re-sorting is exactly what separates an agentic system from an automation script. A script executes a fixed sequence. An agent understands an ambiguous request, selects its own actions, and re-orders them as conditions change. Prioritization is where that self-management lives.

2 · The six criteria, as one scoring function

The chapter defines prioritization as evaluating each candidate task against a set of criteria, then applying scheduling/selection logic to pick the next action, and finally allowing dynamic adjustment as the environment changes. It lists six criteria. We make them concrete by mapping each to a number the agent can actually compute:

Urgency
Time-sensitivity. A deadline, an expiring lock, a user waiting. The book's PM agent maps words like "ASAP", "urgent", "critical" → P0.
Importance
Impact on the primary goal. Does this action move the objective metric, or is it tangential?
Dependency
Is this a prerequisite for other tasks? Unlocking five blocked actions is worth more than its own face value.
Resource availability
Are the tools / data / context this action needs ready now? An action you can't yet run scores low.
Cost / benefit
Expected payoff weighed against tokens, latency, dollars, and risk of a bad side effect.
User preference
Personalization: explicit user-stated importance, or a flagged "always confirm before deleting" rule.

The book says scoring can range from simple rules (keyword → P0/P1/P2, as in its code example) to a complex scoring system to full LLM reasoning ("rank these five actions and explain"). The middle ground — an explicit weighted score — is the one worth being able to write on a whiteboard, because it is auditable and cheap. A clean form for the coding/research agent:

score(a) = wval·value(a) + wdep·unlocks(a) + winfo·infoGain(a) − wrisk·risk(a) − wcost·cost(a)

where each term is normalized to roughly [0,1] and the weights encode policy. Value folds importance and urgency (how much this advances the goal, scaled up if time-critical). Unlocks is the dependency term — how many blocked actions this clears. infoGain rewards uncertainty reduction (the cheap probe that tells you which of two paths is real). Risk and cost are subtracted: a destructive or expensive action must clear a higher bar. The selection logic is then just argmax over allowed actions — "allowed" because the guardrails from lesson 20 can veto an action regardless of score (you never run a high-value action that violates a permission).

3 · Worked example — five candidates, one winner, then a re-rank

Our running agent is fixing a failing bug report in a Python repo. Its monitor (lesson 13) says the goal is "make the failing test pass without regressing others," its budget (lesson 18) is 60k tokens with ~22k already spent. It proposes five candidate next actions. We score each term on [0,1] and use weights wval=3, wdep=2, winfo=2, wrisk=2, wcost=1.

Candidate actionvalueunlocksinfoGainriskcostscore
A · Run the one failing test in isolation0.40.30.90.00.13.5
B · Read 6 more files for context0.30.20.40.00.61.5
C · Refactor the helper module0.50.10.10.70.5-0.1
D · Ask the user which behavior is intended0.60.40.80.10.23.8
E · Open a tracking issue for a side bug0.20.00.10.00.20.6

Take action A as the arithmetic: 3(0.4) + 2(0.3) + 2(0.9) − 2(0.0) − 1(0.1) = 1.2 + 0.6 + 1.8 − 0 − 0.1 = 3.5. The top two are D (3.8) and A (3.5) — both cheap, both high-information, both unblock the rest. The refactor C scores negative because its risk term (0.7) dominates: it could regress passing tests, which directly contradicts the goal. Reading six more files (B) feels productive but is the classic trap — high cost, low information.

So the agent picks D — a single clarifying question — because the expected information gain about which behavior is correct is worth more than blindly running code against an assumption. Suppose the user replies, and now the intended behavior is unambiguous. That observation changes the state, so the agent reprioritizes:

before observation after "user clarified intent" ───────────────────── ───────────────────────────── D ask user 3.8 ◀ pick A run failing test 3.5 ◀ pick A run test 3.5 D ask user 0.4 (infoGain collapsed → done) B read files 1.5 B read files 1.5 E open issue 0.6 C refactor -0.1 C refactor -0.1 E open issue 0.6

D's infoGain term collapses to near zero once the question is answered, so it falls out of contention, and A — run the now-meaningful test — rises to the top. Nobody re-edited a plan. The same scoring function, re-run against the new state, produced a new order. That is the whole pattern: state change → re-score → new next action.

The reprioritization rule, precisely
Re-score the candidate set when an observation changes the state — a tool returned, a test failed, the budget crossed a threshold, a deadline moved, the user replied. Do not re-score on every token or every loop tick with no new information; that just burns budget and can cause thrash (oscillating between two near-tied actions). The trigger is "new evidence," not "time passed."

4 · Interactive — move the weights, watch the queue re-sort

The five candidates above are loaded below with their raw term values. Drag the policy weights and the bars re-rank live; the leader is highlighted. Then flip the "user clarified intent" toggle to apply the observation from §3 — watch D's information gain drop to zero and the leader change. This is the scorer the agent runs internally; the only thing that ships to production is which bar is longest.

Priority scorer — five candidate actions for the bug-fix agent
score(a) = wval·value + wdep·unlocks + winfo·infoGain − wrisk·risk − wcost·cost. The longest bar is the chosen next action. Raise wrisk and the refactor sinks; raise wcost and "read more files" dies; toggle the observation to retire the clarifying question.
Chosen action
Top score
Margin over #2
Show the core JS
function score(a, w){
  return w.val*a.value + w.dep*a.unlocks + w.info*a.infoGain
       - w.risk*a.risk - w.cost*a.cost;
}
// "user clarified intent" observation: the clarifying question's
// info value collapses because the uncertainty it would resolve is gone.
if (clarified) actions.find(a => a.id==='D').infoGain = 0;

const ranked = actions.map(a => ({...a, s: score(a, w)}))
                      .sort((x,y) => y.s - x.s);
const next = ranked[0];                 // argmax over allowed actions

5 · Three levels: goal, plan-step, action

The book is explicit that prioritization is not one decision but happens at three altitudes, and a serious agent runs it at all three:

L1Strategic — goal ordering. Which overall objective to pursue when goals conflict or compete for the same budget. (Ship the bug fix now, or invest in the refactor that prevents the next three bugs?) Slow-changing, often human-set or set once per session.
L2Plan-step ordering — sub-task sequencing. Within the chosen goal, which sub-task next. This is where dependency dominates: reproduce-the-bug must precede fix-the-bug. The planner from lesson 08 emits these; prioritization re-sorts them.
L3Tactical — action selection. The next single tool call. This is the per-loop-tick decision the §3 worked example models, and the one that reprioritizes most often because observations land here.

The analogy the book reaches for is a human team manager who ranks tasks by weighing the team's input. The manager does not re-pick the company's quarterly goal (L1) every morning, but does re-pick what each engineer touches next (L3) as standups surface new blockers. Mixing the levels is a common design error: re-litigating the strategic goal on every tool return is paralysis; never revisiting it means an agent that grinds on a now-pointless objective.

6 · The book's LangChain project-manager agent

The chapter's running code is a concrete instance of L2/L3 prioritization. A LangChain create_react_agent is given a small toolset over an in-memory task store — create_new_task, assign_priority_to_task, assign_task_to_worker, list_all_tasks — and a system prompt that encodes the priority policy in natural language: when a request says "ASAP", "urgent", or "critical", map it to P0; if priority is unstated, default to P1; if no worker is named, default to Worker A. Each Task is a Pydantic model with an optional priority field constrained to P0 / P1 / P2, and the tools validate their arguments through Pydantic schemas.

In the book's simulation, scenario 1 ("need the new login system ASAP, assign to Worker B") is parsed by the agent into: create the task, recognize "ASAP" → assign P0, assign Worker B, then list tasks. Scenario 2 ("review the marketing site content") has no urgency words, so the agent applies the default → P1, Worker A. The point the book draws out: the agent interpreted an ambiguous request, chose its tools, and ordered its actions itself. That keyword-to-priority mapping is the simplest end of the "simple rule → scoring system → LLM reasoning" spectrum from §2 — and it is exactly the policy our weighted scorer generalizes when "urgent" is no longer a single keyword but a real cost/benefit tradeoff.

Framework note
LangChain/LangGraph give you the loop and the tool-calling; the prioritization policy is something you supply — as a prompt rule (book's PM agent), as a scoring function in a node, or as an LLM "rank these candidates" call. CrewAI's role/task model and Google's ADK express the same idea at the orchestration layer (which agent or task runs next). None of them prioritize for you by default; the pattern is the policy you write on top.

7 · Failure modes and the checklist

Failure modes

  • Recency bias. Doing the most recently suggested action instead of the highest-scoring one — the agent's own last thought hijacks the queue. (The fix: always re-score the full candidate set, not just react to the latest token.)
  • Skipping cheap probes. Ignoring a low-cost, high-infoGain action (run one test, ask one question) and instead doing expensive work on an unverified assumption — action B/C in §3.
  • No budget-aware priority. Cost weight set to zero, so the agent explores forever and exhausts its tokens before finishing — the failure lesson 18 exists to prevent.
  • Reprioritization thrash. Two actions near-tied; re-scoring every tick flips between them and nothing completes. Needs a hysteresis margin or commit-to-action rule.
  • Stale priorities. The opposite — never re-scoring after a failure, so the agent keeps executing a plan the world already invalidated.
  • Scoring a vetoed action. Picking a high-value action that guardrails forbid; "allowed" must gate the argmax, not be checked after.

Implementation checklist

  • What is the explicit candidate set right now? (Keep it enumerable and inspectable.)
  • What are the scoring terms, and are they normalized to a common scale?
  • Which action unlocks the most blocked work (dependency term)?
  • Which is the cheapest action that most reduces uncertainty?
  • Is cost/budget actually weighted, with a stop when the budget is near-spent?
  • What observations trigger a re-score — and is the trigger "new evidence," not "time"?
  • Is there a margin / commit rule to prevent thrash on near-ties?
  • Does the guardrail allow-list gate the selection before argmax?
  • At which level (goal / plan-step / action) does each re-score happen?

Where this points next

Prioritization assumes the candidate set is given — it ranks actions the agent already knows about. But the highest-value action is sometimes one no candidate generator proposed, because the relevant information is an unknown unknown: a file nobody read, a hypothesis nobody formed, a failure mode nobody anticipated. A pure prioritizer can only sharpen choices among the options on the table; it cannot widen the table. Lesson 23, Exploration and discovery, is the frontier pattern that does exactly that — the agent proactively searches for information, hypotheses, and possibilities beyond its current candidate set. The two patterns are complementary: exploration generates candidates, prioritization decides which to spend on, and the infoGain term you saw in §3 is precisely the seam where they meet.

Takeaway
Prioritization is the next-action policy that turns a frozen plan into a live, re-sortable queue. Score each candidate action by the book's criteria — urgency, importance, dependency, resource availability, cost/benefit, user preference — folded into an explicit weighted function score = wval·value + wdep·unlocks + winfo·infoGain − wrisk·risk − wcost·cost, then argmax over allowed actions. Run it at three levels — strategic goal, plan-step, tactical action — and re-score whenever an observation changes the state, not on every tick. The book's LangChain PM agent shows the simplest rule-based end (keyword → P0/P1/P2, default P1); the weighted scorer generalizes it. This dynamic self-ordering under conflicting goals and limited resources is what makes the system an agent rather than a script.

Interview prompts