SYSTEM Cited by 1 source

Databricks Join Order Agent¶

What it is¶

Databricks Join Order Agent is a research prototype (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization) developed by Databricks in collaboration with UPenn that applies a frontier-model LLM agent to the join-order optimization problem on the Databricks query engine.

The agent is the canonical first wiki instance of LLM-agent-as-query- optimizer, and demonstrates the offline query-plan tuner pattern.

Architecture¶

Minimal agent loop:

┌───────────────────────────────────────────────┐
│            Frontier LLM (agent)               │
│                                               │
│  system: grammar-constrained structured       │
│    output admitting only valid join orderings │
│  context: query, schema, prior-rollout        │
│    results (runtime + subplan sizes)          │
└───────────────────────────────────────────────┘
                     │ tool call
                     ▼
┌───────────────────────────────────────────────┐
│     execute_plan(candidate_join_order)        │
│  →  Databricks query engine executes the plan │
│     (timeout = original query's runtime)      │
│  →  returns (runtime_ms, per-subplan sizes)   │
└───────────────────────────────────────────────┘
                     │
                     ▼
              [loop N rollouts]
                     │
                     ▼
       return best plan observed

Tool surface¶

One tool. execute_plan(candidate_order) runs the specified join order through Databricks' execution engine (with the original query's runtime as the timeout cap) and returns:

Runtime in milliseconds.
Per-subplan sizes, equivalent to Databricks' EXPLAIN EXTENDED output — the size of each intermediate join's result set.

Structured output constraint¶

Each tool call's input (the candidate join order) is generated via grammar-constrained structured output that admits only semantically-valid join reorderings. This eliminates the failure mode where the agent proposes a syntactically-valid but semantically-invalid plan (e.g. missing a table, duplicating one). See patterns/structured-output-grammar-for-valid-plans.

Budget¶

Rollout count is the budget, not wall-clock:

50 rollouts in the prototype (maximum the agent can use).
15 rollouts in evaluation (the reported results use this tighter budget to amortise across 113 queries).

This makes the agent a concrete instance of patterns/rollout-budget-anytime-plan-search — larger budgets buy better plans monotonically.

Selection¶

Best-of-N. After exhausting the budget, return the plan with the lowest observed runtime. The agent can waste rollouts on exploratory variants; only the best is surfaced.

Evaluation¶

Variable	Value
Benchmark	systems/join-order-benchmark-job (JOB)
Query count	113
Dataset	IMDb (~2 GB base), scaled 10× via row duplication
Rollouts per query	15
Model	Frontier (unnamed in post)
Geomean workload speedup	1.288×
P90 query-latency drop	41%
Baseline 1	Databricks default optimizer
Baseline 2	Perfect cardinality estimates — beaten
Baseline 3	Smaller LLMs — beaten
Baseline 4	systems/bayesqo (Postgres-tuned) — beaten

The striking claim is "outperforms perfect cardinality estimates" — direct execute-and-measure beats the theoretical oracle that traditional cost-model-based optimizers merely approximate.

Canonical win: query 5b¶

JOB's query 5b (5-way join looking for American VHS production companies with post-2010 movies referencing 1994) is the post's illustrated case study. The Databricks default optimizer first finds "American VHS production companies" (12 rows, selective by the estimator's numbers) then joins outward. The agent instead starts from VHS-releases-referencing-1994, which is even more selective — and the default estimator can't tell because the predicates are LIKE patterns (see concepts/like-predicate-cardinality-estimation-failure).

Open architectural questions¶

The paper explicitly names three:

Tool surface expansion. Should the agent have cardinality-probe or data-assumption tools beyond execute_plan?
Automatic triggering. Move from user-flagged slow queries to proactive detection of "optimization potential" from telemetry.
Agent-as-training-signal. Every better plan the agent finds is proof of a systematic estimator error. Harvest these corrections to improve the native optimizer.

Production status¶

Research prototype. Not a shipping Databricks feature. UPenn research collaboration. Cost per tuned query (frontier inference × N rollouts × plan context) not disclosed.

companies/databricks — containing engineering org
systems/databricks — execution substrate
systems/join-order-benchmark-job — the benchmark
systems/bayesqo — classical baseline
concepts/llm-agent-as-query-optimizer — the concept this canonically instances
concepts/join-order-optimization — the problem
concepts/cardinality-estimation — the estimator layer this sidesteps
concepts/anytime-optimization-algorithm — the algorithmic family
patterns/llm-agent-offline-query-plan-tuner — the full pattern
patterns/structured-output-grammar-for-valid-plans — the validity mechanism
patterns/rollout-budget-anytime-plan-search — the budget mechanism