PATTERN Cited by 1 source

Rollout-budget anytime plan search¶

Shape¶

Bound an iterative plan-search algorithm by a rollout count (not wall-clock, not quality threshold). Each rollout is one end-to-end candidate evaluation: propose → execute → observe. The algorithm maintains a monotonic best-so-far; every rollout either improves the current best or doesn't. The caller gets a clean knob: more rollouts → better plans (up to a diminishing-returns asymptote).

This converts plan search into an anytime algorithm with compute-budget as a first-class parameter.

Canonical instance¶

Databricks' join-order agent (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization):

"Our agent progressively improves the workload with each tested plan (sometimes called a rollout), creating a simple anytime algorithm where larger time budgets can be translated into further query performance."

Rollout = one agent tool call (execute_plan(candidate)).
Budget = 50 rollouts in the prototype, 15 rollouts in evaluation.
Best-so-far preserved across the budget; returned at the end.
Asymptote acknowledged: "eventually query performance will stop improving."

Why rollouts beat wall-clock as the budget metric¶

Axis	Wall-clock budget	Rollout budget
Noise	Dominated by model-latency variance	Stable across models
Reproducibility	Hard	Easy (same N rollouts)
Cost modelling	Depends on hardware + latency	Maps directly to tool-call count → dollars
Partial-progress guarantee	None (may run out mid-inference)	Each rollout completes atomically

Allocation knob per caller¶

The same agent can be invoked with different budgets for different use cases:

Caller	Budget
Interactive slow-query investigation	Small (10–20)
Nightly batch optimization sweep	Medium (50–100 per query)
Workload regression analysis	Large (hundreds, parallelised across queries)
Exhaustive paper-result reproduction	Very large (1000+)

Best-of-N selection is load-bearing¶

The anytime property depends on keeping the best plan seen so far as a committed return value. The agent can waste many rollouts on bad ideas and still produce a great final answer — because the system only surfaces the best, not the last. This has implications:

Exploration is cheaper than it looks. A single rollout spent on a wild idea is harmless unless it finds something good.
Diminishing returns are hidden. The first rollout that improves on the default is a big win; later improvements are marginal.
Early stopping needs plateau detection. Without it, you always burn the full budget even after the improvement curve flattens.

Pairing with exploration-exploitation¶

The rollout budget structures the exploration-exploitation tradeoff: each rollout is an allocation decision. Production systems may wrap the agent with an outer scheduler enforcing minimum exploration (diversity) or maximum exploitation (stop re-testing the same plan).

When this fits¶

The underlying problem is expensive (each candidate costs real compute, so enumerating all is infeasible).
Quality-vs-compute is a meaningful continuous curve.
Caller has heterogeneous time budgets.
Each rollout is independent and atomic.

When it doesn't fit¶

Hot-path constraints. Budgets of any size are too slow.
Non-monotonic improvement. If a later rollout can corrupt the best-so-far, you need checkpointing, not budget.
Inherently-sequential search with no notion of rollouts. Gradient-based optimization doesn't cleanly fit this pattern.

Composition¶

Pattern	Relationship
patterns/llm-agent-offline-query-plan-tuner	Containing pattern — this is the budget leg
patterns/structured-output-grammar-for-valid-plans	Sibling validity leg
patterns/agent-spawn-parallel-exploration	Alternative: spend budget across parallel agents instead of sequential rollouts

Seen in¶

sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization — Canonical first wiki instance. 50-rollout prototype budget / 15 rollout evaluation. Explicit "anytime algorithm" framing. Diminishing-returns asymptote named.

concepts/anytime-optimization-algorithm — the algorithmic shape
concepts/exploration-exploitation-tradeoff-in-agent-search — the decision taken at each rollout
concepts/llm-agent-as-query-optimizer — the domain application
patterns/llm-agent-offline-query-plan-tuner — containing pattern
patterns/structured-output-grammar-for-valid-plans — companion validity pattern