CONCEPT Cited by 1 source
Refinement-round budget¶
Definition¶
Refinement-round budget is the bounded-iteration discipline of a judge-gated agent loop: every loop has a hard ceiling on the number of plan → implement → verify → refine cycles it may run, and the loop terminates at either judge satisfaction or budget exhaustion — whichever comes first.
The concept is a safety-net primitive for iterative plan refinement. Without a ceiling, non-converging loops (Verifier keeps rejecting but Router-driven fixes don't fix the underlying issue) run unboundedly; with one, cost is bounded but some inputs may return unfinished work.
Canonical numeric anchor¶
DS-STAR publishes the most detailed round-budget numbers on the wiki:
| Parameter | Value | Source |
|---|---|---|
| Maximum rounds | 10 | DS-STAR loop spec |
| Avg rounds, easy DABStep tasks | 3.0 | empirical |
| Avg rounds, hard DABStep tasks | 5.6 | empirical |
| Share of easy tasks completing in 1 round | >50 % | empirical |
"over half of the easy tasks were completed in just a single round" (Source: sources/2025-11-06-google-ds-star-versatile-data-science-agent).
Shape of the distribution¶
Round count is difficulty-conditioned:
- Easy tasks (single file, answer locally extractable): distribution bunched at 1 round, tailing off.
- Hard tasks (multiple files, cross-file reasoning): distribution centred further out, averaging nearly double the easy case.
The ceiling (10) is well above the hard-task average (5.6), so the loop rarely exhausts budget on DABStep hard — but the presence of the ceiling still matters for pathological inputs or judge failures.
Why it matters¶
- Cost bounding. Each round = Planner + Coder + Verifier (+ Router on reject) inference. A runaway loop is expensive.
- Latency bounding. For user-facing agents, a budget ceiling translates to a worst-case response time.
- Failure mode articulation. Budget-exhaustion is a distinct failure mode from judge-rejected; end-user UX must distinguish "the agent tried 10 times and couldn't confirm a plan" from "the agent refused the task."
Tradeoffs / gotchas¶
- Ceiling calibration is empirical. DS-STAR's 10 is framed as the safety ceiling, far above the 5.6 avg; a lower ceiling would cap cost more aggressively at the price of timing out on tail-harder inputs.
- On-budget-exhaustion behaviour is under-specified. The DS-STAR post says "the final code is delivered as the solution" on reaching the max rounds — i.e. return best-effort even without Verifier approval. Other designs might error, retry, or escalate; the choice is a product UX decision.
- Doesn't catch judge-calibration drift. If the Verifier silently over-approves, rounds used drops and the budget looks healthy — but answers are worse. The budget is a cost control, not a correctness check.
Seen in¶
- sources/2025-11-06-google-ds-star-versatile-data-science-agent — canonical wiki instance; the round-count chart tied to DABStep's easy/hard split.
Related¶
- concepts/iterative-plan-refinement — the loop discipline this budget bounds.
- concepts/llm-as-judge — the Verifier whose accept/reject drives round consumption.
- systems/ds-star — the canonical instance.
- patterns/planner-coder-verifier-router-loop — the pattern whose termination condition this concept specifies.