PATTERN Cited by 1 source

LLM agent offline query plan tuner¶

Shape¶

Deploy a frontier-model LLM agent as an offline query-plan improver for a database engine. Give it:

One tool — execute_plan(candidate) → (runtime_ms, subplan_sizes).
A rollout budget — fixed N invocations per query.
Grammar- constrained output admitting only valid plans.
Best-of-N selection — return the fastest observed plan.

The agent runs out-of-hot-path, against slow queries flagged for tuning, on representative or replica data. It produces a plan (or plan hint) to be applied to future executions of the query — not a rewritten SQL string, not a natural-language explanation.

Canonical instance¶

Databricks + UPenn's join-order agent (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization):

Domain: concepts/join-order-optimization (one of several plan decisions the optimizer makes).
Tool: Execute a candidate join order on Databricks, return runtime + per-subplan sizes. Timeout = original query's runtime.
Budget: 50 rollouts in the prototype, 15 in evaluation.
Grammar: Structured output that admits only valid join reorderings.
Benchmark: JOB on 10×-scaled IMDb, 113 queries.
Result: 1.288× geomean workload speedup, 41% P90 drop.
Beats: Perfect cardinality estimates, smaller LLMs, classical BayesQO (note: designed for Postgres).

Why this shape works¶

Three architectural decisions matter:

Offline, not hot path. Frontier-model latency (seconds-to-tens-of-seconds agentic loops) is 100–1000× query-optimizer budget. Putting the agent in the hot path is infeasible; moving it to offline tuning changes the latency constraint entirely. See concepts/offline-query-tuning-loop.
Measurement replaces estimation. The agent's one tool observes actual runtimes, sidestepping the fragile cardinality estimator layer. This is why the result beats perfect cardinality estimates — even perfect estimates feed a cost model that still approximates execution.
Grammar-constrained validity. Plans are a structured object with correctness constraints (each table joined once, associativity preserved, etc.). Enforcing validity via the decoding grammar eliminates a whole class of agent failure modes (proposing syntactically-broken plans) before generation.

When this fits¶

Condition	Why
Query is slow enough that offline tuning is worth minutes of compute	Rollout budgets imply N executions per tune; this amortises only if the query runs many times
Plan space is large and measurement-friendly	Agent search needs a meaningful exponential to navigate and a cheap way to measure each candidate
Grammar-expressible validity	If plan validity is a free-form property you can't encode as a grammar, the `structured-output` leg doesn't work
Historically human-tuned	Perfect fit — automate the DBA loop, don't displace the optimizer

When it doesn't fit¶

Hot-path optimization. Frontier-model latency still disqualifies.
Ad-hoc one-off queries. N executions per tune doesn't amortise if the query runs once.
Fully-automatic online systems. No place for a human-like offline loop.
Query engines where measurement is expensive. If each rollout is itself a multi-hour job, budgeted search collapses.

Composition with existing patterns¶

Pattern	Relationship
patterns/structured-output-grammar-for-valid-plans	Plan-validity leg of this pattern
patterns/rollout-budget-anytime-plan-search	Time-budget knob making this anytime
patterns/tool-call-loop-minimal-agent	Narrow-tool, budget-bounded agent shape this instantiates
patterns/snapshot-replay-agent-evaluation	Evaluation methodology sibling — replay real queries through the agent
patterns/specialized-agent-decomposition	Possible next step: per-plan-decision agents (join-order, index-selection, pushdown)

Open architectural questions (from the source)¶

Tool surface. Should the agent have cardinality-probe or data-assumption tools beyond execute_plan?
Trigger policy. Manual (user flags) vs. automatic (telemetry detects "optimization potential").
Learning back into the optimizer. Agent finds plan X is better than optimizer's plan Y → proof of estimator error → training signal to fix the estimator itself.

Seen in¶

sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization — Canonical first wiki instance. Databricks + UPenn research prototype. Join ordering. Frontier model. 50 rollouts. Grammar-constrained output. JOB benchmark.

concepts/llm-agent-as-query-optimizer — the concept page
concepts/offline-query-tuning-loop — the workflow shape
concepts/anytime-optimization-algorithm — the algorithmic family
concepts/exploration-exploitation-tradeoff-in-agent-search — how the rollout budget is spent
concepts/join-order-optimization — the specific problem this instance targets
concepts/cardinality-estimation — the step this pattern sidesteps
patterns/structured-output-grammar-for-valid-plans — companion validity pattern
patterns/rollout-budget-anytime-plan-search — companion budget pattern
systems/databricks-join-order-agent — the prototype
systems/bayesqo — the classical baseline