Skip to content

PATTERN Cited by 1 source

Rollout-budget anytime plan search

Shape

Bound an iterative plan-search algorithm by a rollout count (not wall-clock, not quality threshold). Each rollout is one end-to-end candidate evaluation: propose → execute → observe. The algorithm maintains a monotonic best-so-far; every rollout either improves the current best or doesn't. The caller gets a clean knob: more rollouts → better plans (up to a diminishing-returns asymptote).

This converts plan search into an anytime algorithm with compute-budget as a first-class parameter.

Canonical instance

Databricks' join-order agent (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization):

"Our agent progressively improves the workload with each tested plan (sometimes called a rollout), creating a simple anytime algorithm where larger time budgets can be translated into further query performance."

  • Rollout = one agent tool call (execute_plan(candidate)).
  • Budget = 50 rollouts in the prototype, 15 rollouts in evaluation.
  • Best-so-far preserved across the budget; returned at the end.
  • Asymptote acknowledged: "eventually query performance will stop improving."

Why rollouts beat wall-clock as the budget metric

Axis Wall-clock budget Rollout budget
Noise Dominated by model-latency variance Stable across models
Reproducibility Hard Easy (same N rollouts)
Cost modelling Depends on hardware + latency Maps directly to tool-call count → dollars
Partial-progress guarantee None (may run out mid-inference) Each rollout completes atomically

Allocation knob per caller

The same agent can be invoked with different budgets for different use cases:

Caller Budget
Interactive slow-query investigation Small (10–20)
Nightly batch optimization sweep Medium (50–100 per query)
Workload regression analysis Large (hundreds, parallelised across queries)
Exhaustive paper-result reproduction Very large (1000+)

Best-of-N selection is load-bearing

The anytime property depends on keeping the best plan seen so far as a committed return value. The agent can waste many rollouts on bad ideas and still produce a great final answer — because the system only surfaces the best, not the last. This has implications:

  • Exploration is cheaper than it looks. A single rollout spent on a wild idea is harmless unless it finds something good.
  • Diminishing returns are hidden. The first rollout that improves on the default is a big win; later improvements are marginal.
  • Early stopping needs plateau detection. Without it, you always burn the full budget even after the improvement curve flattens.

Pairing with exploration-exploitation

The rollout budget structures the exploration-exploitation tradeoff: each rollout is an allocation decision. Production systems may wrap the agent with an outer scheduler enforcing minimum exploration (diversity) or maximum exploitation (stop re-testing the same plan).

When this fits

  • The underlying problem is expensive (each candidate costs real compute, so enumerating all is infeasible).
  • Quality-vs-compute is a meaningful continuous curve.
  • Caller has heterogeneous time budgets.
  • Each rollout is independent and atomic.

When it doesn't fit

  • Hot-path constraints. Budgets of any size are too slow.
  • Non-monotonic improvement. If a later rollout can corrupt the best-so-far, you need checkpointing, not budget.
  • Inherently-sequential search with no notion of rollouts. Gradient-based optimization doesn't cleanly fit this pattern.

Composition

Pattern Relationship
patterns/llm-agent-offline-query-plan-tuner Containing pattern — this is the budget leg
patterns/structured-output-grammar-for-valid-plans Sibling validity leg
patterns/agent-spawn-parallel-exploration Alternative: spend budget across parallel agents instead of sequential rollouts

Seen in

Last updated · 510 distilled / 1,221 read