Skip to content

PATTERN Cited by 1 source

LLM agent offline query plan tuner

Shape

Deploy a frontier-model LLM agent as an offline query-plan improver for a database engine. Give it:

  • One toolexecute_plan(candidate) → (runtime_ms, subplan_sizes).
  • A rollout budget — fixed N invocations per query.
  • Grammar- constrained output admitting only valid plans.
  • Best-of-N selection — return the fastest observed plan.

The agent runs out-of-hot-path, against slow queries flagged for tuning, on representative or replica data. It produces a plan (or plan hint) to be applied to future executions of the query — not a rewritten SQL string, not a natural-language explanation.

Canonical instance

Databricks + UPenn's join-order agent (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization):

  • Domain: concepts/join-order-optimization (one of several plan decisions the optimizer makes).
  • Tool: Execute a candidate join order on Databricks, return runtime + per-subplan sizes. Timeout = original query's runtime.
  • Budget: 50 rollouts in the prototype, 15 in evaluation.
  • Grammar: Structured output that admits only valid join reorderings.
  • Benchmark: JOB on 10×-scaled IMDb, 113 queries.
  • Result: 1.288× geomean workload speedup, 41% P90 drop.
  • Beats: Perfect cardinality estimates, smaller LLMs, classical BayesQO (note: designed for Postgres).

Why this shape works

Three architectural decisions matter:

  1. Offline, not hot path. Frontier-model latency (seconds-to-tens-of-seconds agentic loops) is 100–1000× query-optimizer budget. Putting the agent in the hot path is infeasible; moving it to offline tuning changes the latency constraint entirely. See concepts/offline-query-tuning-loop.

  2. Measurement replaces estimation. The agent's one tool observes actual runtimes, sidestepping the fragile cardinality estimator layer. This is why the result beats perfect cardinality estimates — even perfect estimates feed a cost model that still approximates execution.

  3. Grammar-constrained validity. Plans are a structured object with correctness constraints (each table joined once, associativity preserved, etc.). Enforcing validity via the decoding grammar eliminates a whole class of agent failure modes (proposing syntactically-broken plans) before generation.

When this fits

Condition Why
Query is slow enough that offline tuning is worth minutes of compute Rollout budgets imply N executions per tune; this amortises only if the query runs many times
Plan space is large and measurement-friendly Agent search needs a meaningful exponential to navigate and a cheap way to measure each candidate
Grammar-expressible validity If plan validity is a free-form property you can't encode as a grammar, the structured-output leg doesn't work
Historically human-tuned Perfect fit — automate the DBA loop, don't displace the optimizer

When it doesn't fit

  • Hot-path optimization. Frontier-model latency still disqualifies.
  • Ad-hoc one-off queries. N executions per tune doesn't amortise if the query runs once.
  • Fully-automatic online systems. No place for a human-like offline loop.
  • Query engines where measurement is expensive. If each rollout is itself a multi-hour job, budgeted search collapses.

Composition with existing patterns

Pattern Relationship
patterns/structured-output-grammar-for-valid-plans Plan-validity leg of this pattern
patterns/rollout-budget-anytime-plan-search Time-budget knob making this anytime
patterns/tool-call-loop-minimal-agent Narrow-tool, budget-bounded agent shape this instantiates
patterns/snapshot-replay-agent-evaluation Evaluation methodology sibling — replay real queries through the agent
patterns/specialized-agent-decomposition Possible next step: per-plan-decision agents (join-order, index-selection, pushdown)

Open architectural questions (from the source)

  • Tool surface. Should the agent have cardinality-probe or data-assumption tools beyond execute_plan?
  • Trigger policy. Manual (user flags) vs. automatic (telemetry detects "optimization potential").
  • Learning back into the optimizer. Agent finds plan X is better than optimizer's plan Y → proof of estimator error → training signal to fix the estimator itself.

Seen in

Last updated · 510 distilled / 1,221 read