Refinement-round budget¶

CONCEPT Cited by 2 sources

Definition¶

Refinement-round budget is the bounded-iteration discipline of a judge-gated agent loop: every loop has a hard ceiling on the number of plan → implement → verify → refine cycles it may run, and the loop terminates at either judge satisfaction or budget exhaustion — whichever comes first.

The concept is a safety-net primitive for iterative plan refinement. Without a ceiling, non-converging loops (Verifier keeps rejecting but Router-driven fixes don't fix the underlying issue) run unboundedly; with one, cost is bounded but some inputs may return unfinished work.

Canonical numeric anchor¶

DS-STAR publishes the most detailed round-budget numbers on the wiki:

Parameter	Value	Source
Maximum rounds	10	DS-STAR loop spec
Avg rounds, easy DABStep tasks	3.0	empirical
Avg rounds, hard DABStep tasks	5.6	empirical
Share of easy tasks completing in 1 round	>50 %	empirical

"over half of the easy tasks were completed in just a single round" (Source: sources/2025-11-06-google-ds-star-versatile-data-science-agent).

Shape of the distribution¶

Round count is difficulty-conditioned:

Easy tasks (single file, answer locally extractable): distribution bunched at 1 round, tailing off.
Hard tasks (multiple files, cross-file reasoning): distribution centred further out, averaging nearly double the easy case.

The ceiling (10) is well above the hard-task average (5.6), so the loop rarely exhausts budget on DABStep hard — but the presence of the ceiling still matters for pathological inputs or judge failures.

Why it matters¶

Cost bounding. Each round = Planner + Coder + Verifier (+ Router on reject) inference. A runaway loop is expensive.
Latency bounding. For user-facing agents, a budget ceiling translates to a worst-case response time.
Failure mode articulation. Budget-exhaustion is a distinct failure mode from judge-rejected; end-user UX must distinguish "the agent tried 10 times and couldn't confirm a plan" from "the agent refused the task."

Tradeoffs / gotchas¶

Ceiling calibration is empirical. DS-STAR's 10 is framed as the safety ceiling, far above the 5.6 avg; a lower ceiling would cap cost more aggressively at the price of timing out on tail-harder inputs.
On-budget-exhaustion behaviour is under-specified. The DS-STAR post says "the final code is delivered as the solution" on reaching the max rounds — i.e. return best-effort even without Verifier approval. Other designs might error, retry, or escalate; the choice is a product UX decision.
Doesn't catch judge-calibration drift. If the Verifier silently over-approves, rounds used drops and the budget looks healthy — but answers are worse. The budget is a cost control, not a correctness check.

Seen in¶

sources/2025-11-06-google-ds-star-versatile-data-science-agent — canonical wiki instance; the round-count chart tied to DABStep's easy/hard split.
sources/2026-02-19-lyft-scaling-localization-with-ai — text-translation layer instance. Lyft's AI localization pipeline caps the Drafter→Evaluator→critique→Drafter loop at 3 attempts per source string. Rationale, directly quoted: "iterative refinement yields the largest gains in the first 1–2 cycles, so the three-attempt limit balances quality improvement against latency and cost." No published convergence distribution; no published terminal-behaviour policy on all-three-attempts-fail. Useful numeric comparison with DS-STAR: budget = 3 at the per-string translation layer vs. budget = 10 at the agent-plan layer — the translation task converges faster / has narrower tail than open-ended agent planning.

concepts/iterative-plan-refinement — the loop discipline this budget bounds (agent-plan layer).
concepts/iterative-prompt-refinement — the text/image-output sibling loop this budget also bounds.
concepts/llm-as-judge — the Verifier/Evaluator whose accept/reject drives round consumption.
systems/ds-star — canonical instance at the agent layer.
systems/lyft-ai-localization-pipeline — canonical instance at the per-string translation layer.
patterns/planner-coder-verifier-router-loop — agent-layer pattern whose termination condition this concept specifies.
patterns/drafter-evaluator-refinement-loop — text/structured- output sibling pattern.