PATTERN Cited by 1 source
LLM agent offline query plan tuner¶
Shape¶
Deploy a frontier-model LLM agent as an offline query-plan improver for a database engine. Give it:
- One tool —
execute_plan(candidate) → (runtime_ms, subplan_sizes). - A rollout budget — fixed N invocations per query.
- Grammar- constrained output admitting only valid plans.
- Best-of-N selection — return the fastest observed plan.
The agent runs out-of-hot-path, against slow queries flagged for tuning, on representative or replica data. It produces a plan (or plan hint) to be applied to future executions of the query — not a rewritten SQL string, not a natural-language explanation.
Canonical instance¶
Databricks + UPenn's join-order agent (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization):
- Domain: concepts/join-order-optimization (one of several plan decisions the optimizer makes).
- Tool: Execute a candidate join order on Databricks, return runtime + per-subplan sizes. Timeout = original query's runtime.
- Budget: 50 rollouts in the prototype, 15 in evaluation.
- Grammar: Structured output that admits only valid join reorderings.
- Benchmark: JOB on 10×-scaled IMDb, 113 queries.
- Result: 1.288× geomean workload speedup, 41% P90 drop.
- Beats: Perfect cardinality estimates, smaller LLMs, classical BayesQO (note: designed for Postgres).
Why this shape works¶
Three architectural decisions matter:
-
Offline, not hot path. Frontier-model latency (seconds-to-tens-of-seconds agentic loops) is 100–1000× query-optimizer budget. Putting the agent in the hot path is infeasible; moving it to offline tuning changes the latency constraint entirely. See concepts/offline-query-tuning-loop.
-
Measurement replaces estimation. The agent's one tool observes actual runtimes, sidestepping the fragile cardinality estimator layer. This is why the result beats perfect cardinality estimates — even perfect estimates feed a cost model that still approximates execution.
-
Grammar-constrained validity. Plans are a structured object with correctness constraints (each table joined once, associativity preserved, etc.). Enforcing validity via the decoding grammar eliminates a whole class of agent failure modes (proposing syntactically-broken plans) before generation.
When this fits¶
| Condition | Why |
|---|---|
| Query is slow enough that offline tuning is worth minutes of compute | Rollout budgets imply N executions per tune; this amortises only if the query runs many times |
| Plan space is large and measurement-friendly | Agent search needs a meaningful exponential to navigate and a cheap way to measure each candidate |
| Grammar-expressible validity | If plan validity is a free-form property you can't encode as a grammar, the structured-output leg doesn't work |
| Historically human-tuned | Perfect fit — automate the DBA loop, don't displace the optimizer |
When it doesn't fit¶
- Hot-path optimization. Frontier-model latency still disqualifies.
- Ad-hoc one-off queries. N executions per tune doesn't amortise if the query runs once.
- Fully-automatic online systems. No place for a human-like offline loop.
- Query engines where measurement is expensive. If each rollout is itself a multi-hour job, budgeted search collapses.
Composition with existing patterns¶
| Pattern | Relationship |
|---|---|
| patterns/structured-output-grammar-for-valid-plans | Plan-validity leg of this pattern |
| patterns/rollout-budget-anytime-plan-search | Time-budget knob making this anytime |
| patterns/tool-call-loop-minimal-agent | Narrow-tool, budget-bounded agent shape this instantiates |
| patterns/snapshot-replay-agent-evaluation | Evaluation methodology sibling — replay real queries through the agent |
| patterns/specialized-agent-decomposition | Possible next step: per-plan-decision agents (join-order, index-selection, pushdown) |
Open architectural questions (from the source)¶
- Tool surface. Should the agent have cardinality-probe
or data-assumption tools beyond
execute_plan? - Trigger policy. Manual (user flags) vs. automatic (telemetry detects "optimization potential").
- Learning back into the optimizer. Agent finds plan X is better than optimizer's plan Y → proof of estimator error → training signal to fix the estimator itself.
Seen in¶
- sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization — Canonical first wiki instance. Databricks + UPenn research prototype. Join ordering. Frontier model. 50 rollouts. Grammar-constrained output. JOB benchmark.
Related¶
- concepts/llm-agent-as-query-optimizer — the concept page
- concepts/offline-query-tuning-loop — the workflow shape
- concepts/anytime-optimization-algorithm — the algorithmic family
- concepts/exploration-exploitation-tradeoff-in-agent-search — how the rollout budget is spent
- concepts/join-order-optimization — the specific problem this instance targets
- concepts/cardinality-estimation — the step this pattern sidesteps
- patterns/structured-output-grammar-for-valid-plans — companion validity pattern
- patterns/rollout-budget-anytime-plan-search — companion budget pattern
- systems/databricks-join-order-agent — the prototype
- systems/bayesqo — the classical baseline