CONCEPT Cited by 1 source

Join order optimization¶

Definition¶

Join-order optimization is the query-optimizer subproblem of choosing which order to execute a sequence of logical joins in. Because joins are commutative and associative, the same logical query — e.g. Actor ⋈ Stars ⋈ Movie ⋈ Produces ⋈ Company — admits many physically distinct execution plans. Which one is fastest depends entirely on intermediate-result sizes, which depend on the data.

Why it's hard¶

Three compounding factors (Source: sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization):

Exponential search space. "The number of possible plans grows exponentially with the number of tables — and analytics queries regularly join 20-30 different tables." A 10-way linear join has 10! ≈ 3.6M orderings; bushy plans multiply this further.
Cardinality estimation is nearly as hard as execution itself. "Estimating this quantity [subquery size] is as difficult as executing the query itself (in general)." Get the estimate wrong and the cost model picks a plan that looks cheap but isn't.
Three fragile components must work together. Optimizers compose (a) a cardinality estimator, (b) a cost model comparing candidate plans, (c) a search procedure navigating the exponential space. Each can fail independently; errors compound.

The three components¶

Per the Databricks post:

"Traditional query optimizers solve this problem with three components: a cardinality estimator designed to quickly guess the size of subqueries (e.g., to guess how many movies Sony has produced), a cost model to compare different potential plans, and a search procedure that navigates the exponentially-large space."

This is the canonical three-leg decomposition. See concepts/query-planner for the broader compiler-analogy framing that contains join ordering as one optimisation sub-phase.

Why it matters¶

Query 5b of the Join Order Benchmark — a simple 5-way join over IMDb — illustrates the stakes: the default Databricks optimizer picks one join order; an LLM agent finds a different one that is significantly faster because the default estimator underestimates LIKE- predicate selectivity (see concepts/like-predicate-cardinality-estimation-failure). Most production-query slowness on complex joins is an ordering problem, not an execution-engine problem.

Classical approaches¶

Decades of research: adaptive feedback loops, deep-learning estimators, distribution modelling, database-theory bounds, learning theory, factor decompositions. Alternative architectures replace the whole optimizer with reinforcement learning, multi-armed bandits, Bayesian optimization (see systems/bayesqo), or specialised join algorithms. "All of these solutions add significant complexity to a query optimizer, and require significant engineering effort to integrate, maintain, and productionize."

The LLM-agent alternative¶

The Databricks/UPenn experiment skips the estimator-and-cost- model scaffolding entirely: give a frontier LLM agent a single tool (execute this join order, return runtime + subplan sizes), a rollout budget, and structured-output constraints that admit only valid orderings. Measurement replaces estimation. See concepts/llm-agent-as-query-optimizer and patterns/llm-agent-offline-query-plan-tuner.

Crucially this works only offline — LLM latency is too high for the query-optimizer hot path — but the offline niche is exactly where human DBA experts spend hours tuning bad plans today. See concepts/offline-query-tuning-loop.

Seen in¶

sources/2026-04-22-databricks-are-llm-agents-good-at-join-order-optimization — Canonical modern framing. Three-component decomposition; exponential search; cardinality estimation as the weakest link; LLM agent as an offline alternative.
— Andres Taylor's pedagogical framing within the broader query-planner domain; introduces join-order as pathfinding.

concepts/query-planner — join ordering is one decision the planner makes
concepts/cardinality-estimation — the weakest link
concepts/nested-loop-join — joining two tables is the inner operation this problem sequences
concepts/llm-agent-as-query-optimizer — the LLM-agent alternative
concepts/anytime-optimization-algorithm — time-budget shape of agent-based tuning
systems/databricks-join-order-agent — the prototype
systems/join-order-benchmark-job — the benchmark the problem is canonically measured on
systems/bayesqo — Bayesian-optimization-based offline baseline