PATTERN Cited by 1 source
Rollout-budget anytime plan search¶
Shape¶
Bound an iterative plan-search algorithm by a rollout count (not wall-clock, not quality threshold). Each rollout is one end-to-end candidate evaluation: propose → execute → observe. The algorithm maintains a monotonic best-so-far; every rollout either improves the current best or doesn't. The caller gets a clean knob: more rollouts → better plans (up to a diminishing-returns asymptote).
This converts plan search into an anytime algorithm with compute-budget as a first-class parameter.
Canonical instance¶
Databricks' join-order agent (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization):
"Our agent progressively improves the workload with each tested plan (sometimes called a rollout), creating a simple anytime algorithm where larger time budgets can be translated into further query performance."
- Rollout = one agent tool call (
execute_plan(candidate)). - Budget = 50 rollouts in the prototype, 15 rollouts in evaluation.
- Best-so-far preserved across the budget; returned at the end.
- Asymptote acknowledged: "eventually query performance will stop improving."
Why rollouts beat wall-clock as the budget metric¶
| Axis | Wall-clock budget | Rollout budget |
|---|---|---|
| Noise | Dominated by model-latency variance | Stable across models |
| Reproducibility | Hard | Easy (same N rollouts) |
| Cost modelling | Depends on hardware + latency | Maps directly to tool-call count → dollars |
| Partial-progress guarantee | None (may run out mid-inference) | Each rollout completes atomically |
Allocation knob per caller¶
The same agent can be invoked with different budgets for different use cases:
| Caller | Budget |
|---|---|
| Interactive slow-query investigation | Small (10–20) |
| Nightly batch optimization sweep | Medium (50–100 per query) |
| Workload regression analysis | Large (hundreds, parallelised across queries) |
| Exhaustive paper-result reproduction | Very large (1000+) |
Best-of-N selection is load-bearing¶
The anytime property depends on keeping the best plan seen so far as a committed return value. The agent can waste many rollouts on bad ideas and still produce a great final answer — because the system only surfaces the best, not the last. This has implications:
- Exploration is cheaper than it looks. A single rollout spent on a wild idea is harmless unless it finds something good.
- Diminishing returns are hidden. The first rollout that improves on the default is a big win; later improvements are marginal.
- Early stopping needs plateau detection. Without it, you always burn the full budget even after the improvement curve flattens.
Pairing with exploration-exploitation¶
The rollout budget structures the exploration-exploitation tradeoff: each rollout is an allocation decision. Production systems may wrap the agent with an outer scheduler enforcing minimum exploration (diversity) or maximum exploitation (stop re-testing the same plan).
When this fits¶
- The underlying problem is expensive (each candidate costs real compute, so enumerating all is infeasible).
- Quality-vs-compute is a meaningful continuous curve.
- Caller has heterogeneous time budgets.
- Each rollout is independent and atomic.
When it doesn't fit¶
- Hot-path constraints. Budgets of any size are too slow.
- Non-monotonic improvement. If a later rollout can corrupt the best-so-far, you need checkpointing, not budget.
- Inherently-sequential search with no notion of rollouts. Gradient-based optimization doesn't cleanly fit this pattern.
Composition¶
| Pattern | Relationship |
|---|---|
| patterns/llm-agent-offline-query-plan-tuner | Containing pattern — this is the budget leg |
| patterns/structured-output-grammar-for-valid-plans | Sibling validity leg |
| patterns/agent-spawn-parallel-exploration | Alternative: spend budget across parallel agents instead of sequential rollouts |
Seen in¶
- sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization — Canonical first wiki instance. 50-rollout prototype budget / 15 rollout evaluation. Explicit "anytime algorithm" framing. Diminishing-returns asymptote named.
Related¶
- concepts/anytime-optimization-algorithm — the algorithmic shape
- concepts/exploration-exploitation-tradeoff-in-agent-search — the decision taken at each rollout
- concepts/llm-agent-as-query-optimizer — the domain application
- patterns/llm-agent-offline-query-plan-tuner — containing pattern
- patterns/structured-output-grammar-for-valid-plans — companion validity pattern