SYSTEM Cited by 1 source
Databricks Join Order Agent¶
What it is¶
Databricks Join Order Agent is a research prototype (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization) developed by Databricks in collaboration with UPenn that applies a frontier-model LLM agent to the join-order optimization problem on the Databricks query engine.
The agent is the canonical first wiki instance of LLM-agent-as-query- optimizer, and demonstrates the offline query-plan tuner pattern.
Architecture¶
Minimal agent loop:
┌───────────────────────────────────────────────┐
│ Frontier LLM (agent) │
│ │
│ system: grammar-constrained structured │
│ output admitting only valid join orderings │
│ context: query, schema, prior-rollout │
│ results (runtime + subplan sizes) │
└───────────────────────────────────────────────┘
│ tool call
▼
┌───────────────────────────────────────────────┐
│ execute_plan(candidate_join_order) │
│ → Databricks query engine executes the plan │
│ (timeout = original query's runtime) │
│ → returns (runtime_ms, per-subplan sizes) │
└───────────────────────────────────────────────┘
│
▼
[loop N rollouts]
│
▼
return best plan observed
Tool surface¶
One tool. execute_plan(candidate_order) runs the specified
join order through Databricks' execution engine (with the
original query's runtime as the timeout cap) and returns:
- Runtime in milliseconds.
- Per-subplan sizes, equivalent to Databricks'
EXPLAIN EXTENDEDoutput — the size of each intermediate join's result set.
Structured output constraint¶
Each tool call's input (the candidate join order) is generated via grammar-constrained structured output that admits only semantically-valid join reorderings. This eliminates the failure mode where the agent proposes a syntactically-valid but semantically-invalid plan (e.g. missing a table, duplicating one). See patterns/structured-output-grammar-for-valid-plans.
Budget¶
Rollout count is the budget, not wall-clock:
- 50 rollouts in the prototype (maximum the agent can use).
- 15 rollouts in evaluation (the reported results use this tighter budget to amortise across 113 queries).
This makes the agent a concrete instance of patterns/rollout-budget-anytime-plan-search — larger budgets buy better plans monotonically.
Selection¶
Best-of-N. After exhausting the budget, return the plan with the lowest observed runtime. The agent can waste rollouts on exploratory variants; only the best is surfaced.
Evaluation¶
| Variable | Value |
|---|---|
| Benchmark | systems/join-order-benchmark-job (JOB) |
| Query count | 113 |
| Dataset | IMDb (~2 GB base), scaled 10× via row duplication |
| Rollouts per query | 15 |
| Model | Frontier (unnamed in post) |
| Geomean workload speedup | 1.288× |
| P90 query-latency drop | 41% |
| Baseline 1 | Databricks default optimizer |
| Baseline 2 | Perfect cardinality estimates — beaten |
| Baseline 3 | Smaller LLMs — beaten |
| Baseline 4 | systems/bayesqo (Postgres-tuned) — beaten |
The striking claim is "outperforms perfect cardinality estimates" — direct execute-and-measure beats the theoretical oracle that traditional cost-model-based optimizers merely approximate.
Canonical win: query 5b¶
JOB's query 5b (5-way join looking for American VHS production
companies with post-2010 movies referencing 1994) is the post's
illustrated case study. The Databricks default optimizer first
finds "American VHS production companies" (12 rows, selective by
the estimator's numbers) then joins outward. The agent instead
starts from VHS-releases-referencing-1994, which is even more
selective — and the default estimator can't tell because the
predicates are LIKE patterns (see
concepts/like-predicate-cardinality-estimation-failure).
Open architectural questions¶
The paper explicitly names three:
- Tool surface expansion. Should the agent have
cardinality-probe or data-assumption tools beyond
execute_plan? - Automatic triggering. Move from user-flagged slow queries to proactive detection of "optimization potential" from telemetry.
- Agent-as-training-signal. Every better plan the agent finds is proof of a systematic estimator error. Harvest these corrections to improve the native optimizer.
Production status¶
Research prototype. Not a shipping Databricks feature. UPenn research collaboration. Cost per tuned query (frontier inference × N rollouts × plan context) not disclosed.
Related¶
- companies/databricks — containing engineering org
- systems/databricks — execution substrate
- systems/join-order-benchmark-job — the benchmark
- systems/bayesqo — classical baseline
- concepts/llm-agent-as-query-optimizer — the concept this canonically instances
- concepts/join-order-optimization — the problem
- concepts/cardinality-estimation — the estimator layer this sidesteps
- concepts/anytime-optimization-algorithm — the algorithmic family
- patterns/llm-agent-offline-query-plan-tuner — the full pattern
- patterns/structured-output-grammar-for-valid-plans — the validity mechanism
- patterns/rollout-budget-anytime-plan-search — the budget mechanism