CONCEPT Cited by 2 sources
Join order optimization¶
Definition¶
Join-order optimization is the query-optimizer subproblem of choosing which order to execute a sequence of logical joins in. Because joins are commutative and associative, the same logical query — e.g. Actor ⋈ Stars ⋈ Movie ⋈ Produces ⋈ Company — admits many physically distinct execution plans. Which one is fastest depends entirely on intermediate-result sizes, which depend on the data.
Why it's hard¶
Three compounding factors (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization):
- Exponential search space. "The number of possible plans grows exponentially with the number of tables — and analytics queries regularly join 20-30 different tables." A 10-way linear join has 10! ≈ 3.6M orderings; bushy plans multiply this further.
- Cardinality estimation is nearly as hard as execution itself. "Estimating this quantity [subquery size] is as difficult as executing the query itself (in general)." Get the estimate wrong and the cost model picks a plan that looks cheap but isn't.
- Three fragile components must work together. Optimizers compose (a) a cardinality estimator, (b) a cost model comparing candidate plans, (c) a search procedure navigating the exponential space. Each can fail independently; errors compound.
The three components¶
Per the Databricks post:
"Traditional query optimizers solve this problem with three components: a cardinality estimator designed to quickly guess the size of subqueries (e.g., to guess how many movies Sony has produced), a cost model to compare different potential plans, and a search procedure that navigates the exponentially-large space."
This is the canonical three-leg decomposition. See concepts/query-planner for the broader compiler-analogy framing that contains join ordering as one optimisation sub-phase.
Why it matters¶
Query 5b of the Join Order
Benchmark — a simple 5-way join over IMDb — illustrates the
stakes: the default Databricks optimizer picks one join order;
an LLM agent finds a different one that is significantly
faster because the default estimator underestimates LIKE-
predicate selectivity (see
concepts/like-predicate-cardinality-estimation-failure).
Most production-query slowness on complex joins is an ordering
problem, not an execution-engine problem.
Classical approaches¶
Decades of research: adaptive feedback loops, deep-learning estimators, distribution modelling, database-theory bounds, learning theory, factor decompositions. Alternative architectures replace the whole optimizer with reinforcement learning, multi-armed bandits, Bayesian optimization (see systems/bayesqo), or specialised join algorithms. "All of these solutions add significant complexity to a query optimizer, and require significant engineering effort to integrate, maintain, and productionize."
The LLM-agent alternative¶
The Databricks/UPenn experiment skips the estimator-and-cost- model scaffolding entirely: give a frontier LLM agent a single tool (execute this join order, return runtime + subplan sizes), a rollout budget, and structured-output constraints that admit only valid orderings. Measurement replaces estimation. See concepts/llm-agent-as-query-optimizer and patterns/llm-agent-offline-query-plan-tuner.
Crucially this works only offline — LLM latency is too high for the query-optimizer hot path — but the offline niche is exactly where human DBA experts spend hours tuning bad plans today. See concepts/offline-query-tuning-loop.
Seen in¶
- sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization — Canonical modern framing. Three-component decomposition; exponential search; cardinality estimation as the weakest link; LLM agent as an offline alternative.
- sources/2026-04-21-planetscale-what-is-a-query-planner — Andres Taylor's pedagogical framing within the broader query-planner domain; introduces join-order as pathfinding.
Related¶
- concepts/query-planner — join ordering is one decision the planner makes
- concepts/cardinality-estimation — the weakest link
- concepts/nested-loop-join — joining two tables is the inner operation this problem sequences
- concepts/llm-agent-as-query-optimizer — the LLM-agent alternative
- concepts/anytime-optimization-algorithm — time-budget shape of agent-based tuning
- systems/databricks-join-order-agent — the prototype
- systems/join-order-benchmark-job — the benchmark the problem is canonically measured on
- systems/bayesqo — Bayesian-optimization-based offline baseline