Skip to content

CONCEPT Cited by 1 source

Offline query tuning loop

Definition

The offline query tuning loop is the workflow by which a query's execution plan is improved outside the query optimizer hot path, typically:

  1. A query is flagged as slow (user report, monitoring, or scheduled audit).
  2. An expert — historically a human DBA, newly an LLM agent — reads the plan, hypothesises a better one, and tests it.
  3. Test = execute the candidate plan on representative data and measure real runtime + intermediate sizes.
  4. Iterate until a satisfactory plan is found.
  5. The better plan is recorded (hint, rewrite, index suggestion) and applied to future executions.

This is distinct from online optimization, which happens inside the optimizer with a sub-second budget. Offline tuning trades latency for correctness.

Why the distinction matters

From the Databricks experiment (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization):

"Query optimizers typically need to pick join orders in a few hundred milliseconds, so integrating an LLM into the hot path of the query optimizer, while potentially promising, is not possible today. But, the iterative and manual process of optimizing the join order for a query, which might take a human expert multiple hours, could potentially be automated with an LLM agent!"

The offline niche is exactly where LLM latency stops being disqualifying. Hours-long human DBA work compresses to minutes of agent rollouts.

The human-expert baseline

Historically this loop requires:

  • A senior DBA or engineer with deep domain knowledge.
  • Access to production or near-production data.
  • An EXPLAIN / EXPLAIN ANALYZE / EXPLAIN EXTENDED surface to inspect candidate plans.
  • A way to force a specific plan (hints, rewrites, config flags) to test it.
  • Hours of iteration per query.

The workflow scales poorly — a large org has more slow queries than DBA-hours to tune them. This is the supply-side bottleneck LLM agents are aimed at.

The LLM-agent instance

Databricks' prototype gives the agent exactly the same tools a human DBA would use:

  • A way to execute a candidate plan and time it.
  • EXPLAIN EXTENDED-equivalent subplan-size output for post-hoc inspection.
  • A rollout budget (50) analogous to a fixed time allocation.

Plus one novel constraint: structured output grammar forcing valid join reorderings — the "typo-proofing" equivalent of a DBA who never writes a syntactically-invalid plan.

Candidate extensions

The post names these as open questions:

  • Proactive trigger. Currently a user flags a query. Could telemetry identify queries with "optimization potential" and queue them for agent tuning automatically?
  • Richer tools. Beyond execute-a-plan, should the agent be able to issue targeted cardinality probes or data- assumption tests? (How many VHS releases exist before 1995?)
  • Feedback into the optimizer. When the agent finds a plan the optimizer missed, that's a proof of estimator error. Harvest the corrections as training signal.

Seen in

Last updated · 510 distilled / 1,221 read