Skip to content

CONCEPT Cited by 1 source

LLM agent as query optimizer

Definition

LLM-agent-as-query-optimizer is the architectural pattern of applying a frontier-model LLM agent, given one or more database execution tools and a bounded rollout budget, to the problem of finding better execution plans for a SQL query. The agent's output is a concrete plan (join order, index choice, etc.) for the query optimizer to use — not a rewritten query and not a natural-language answer.

The approach was prototyped by Databricks + UPenn (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization) for join ordering and delivered a 1.288× geomean speedup / 41% P90 drop on the Join Order Benchmark.

Shape

The canonical instance has five distinguishing properties:

Property This post's instance
Location Offline, out-of-hot-path (not inside the query optimizer)
Tools Single: execute a candidate plan, return runtime + subplan sizes
Budget Fixed rollout count (50 in the prototype, 15 in eval)
Output constraint Structured output grammar admitting only valid join reorderings
Selection Best-of-N: return the fastest rollout observed

See patterns/llm-agent-offline-query-plan-tuner for the full pattern write-up.

Why offline, not hot-path

"Query optimizers typically need to pick join orders in a few hundred milliseconds, so integrating an LLM into the hot path of the query optimizer, while potentially promising, is not possible today."

Frontier-model inference latency (seconds to tens of seconds for agentic loops) is 100–1000× the optimizer's latency budget. The agent instead replaces the manual tuning loop a human DBA performs when a query is flagged as slow — see concepts/offline-query-tuning-loop. This reframes the LLM from component of the optimizer to offline collaborator with the optimizer, which is a much weaker latency constraint.

Why it beats perfect cardinality estimates

From the source:

"When using a frontier model, the agent was able to improve query latency by a factor of 1.288 (geomean). This outperforms using perfect cardinality estimates (intractable in practice)."

This is counter-intuitive but structurally correct: even with perfect concepts/cardinality-estimation, the cost model and search procedure still approximate real execution. The agent measures actual runtimes via its one tool, which short- circuits both approximation layers. You're no longer guessing what a plan will cost — you're observing it.

Contrast with "LLM in hot path" approaches

Prior work (cited as [B] in the post) put LLMs directly in the optimizer's decision path via RAG-based plan lookup. The Databricks experiment differs: "this differs from prior work that asks the LLM to pick a join order instantly in the hot path of the query optimizer; instead, the LLM gets to act like an offline experimenter that tries many candidate plans and learns from the observed outcomes — just like a human tuning a join order by hand!"

Open questions the authors name

From the post's conclusion:

  1. Tool surface. Beyond execute a plan, should the agent be able to issue cardinality-probe queries or data- assumption tests (e.g. "are there any pre-1995 DVD releases?") directly? More tools → more leverage but also more context.
  2. Trigger policy. User-flagged queries are easy. Can the system proactively detect queries with "optimization potential" and spend rollout budget on them automatically?
  3. Learning back into the optimizer. If the agent finds a plan the optimizer missed, that's proof of a systematic estimator error. Can the corrections be harvested into a training signal that improves the native optimizer? This is the obvious next move.

Seen in

Last updated · 510 distilled / 1,221 read