Skip to content

CONCEPT Cited by 1 source

Query planner

Definition

A query planner (also query optimizer) is the component of a database system that takes a textual query — SQL or another query language — and emits a query plan: an executable tree of operators describing which indexes to use, which join algorithms to apply, which order to access tables in, and which operations to push down to which storage layer. The plan is the input to the database's execution engine, which interprets it row-by-row (or batch-by-batch) to produce the result set.

The planner's job is to find the cheapest plan among all semantically-equivalent alternatives. For small queries against a single table this is trivial. For complex queries — many tables, many indexes, many predicates — the alternative space is vast: "the available options can quickly run into the thousands and even millions of alternatives. Most of these alternatives are really slow, so the planner's job is to find the best possible query plan among all possibilities" (Source: sources/2026-04-21-planetscale-what-is-a-query-planner).

The compiler analogy

Andres Taylor's canonical framing (Source: sources/2026-04-21-planetscale-what-is-a-query-planner): a query planner is structurally a compiler that emits a different artefact.

Phase Compiler Query planner
Lex + parse source code → AST SQL string → AST
Semantic analysis resolve identifiers to classes / methods, infer types resolve column references to tables, infer expression types
Optimisation constant folding, inlining, loop transformations on IR predicate rewriting, index selection, join-order search, aggregation decomposition
Code generation emit machine code emit an operator tree for the execution engine

See concepts/compiler-planner-analogy for the detailed unpacking.

What the planner decides

  • Which indexes to use. Table has a primary key + two secondary indexes on name and created_at; query filters on name; which index should the planner pick? Single-node planners use table statistics (selectivity, cardinality) to estimate each candidate's cost.
  • Which join algorithm. Nested-loop, sort-merge, or hash join? Depends on sizes, sortedness, and index availability on the join keys.
  • Join order. For an N-way join, there are N! linear orderings plus bushy variants; the planner searches for the order with the smallest cumulative intermediate row count.
  • Predicate pushdown. Can a WHERE clause be evaluated before a join (smaller intermediate state) or must it wait until after?
  • Aggregation placement. Can a GROUP BY be pushed under a join (pre-aggregate) or must it run at the top of the plan?
  • Sort / limit placement. Can LIMIT reach down into a sub-plan that has a sorted index, or must the full result be produced and then truncated?
  • In distributed planners: which operators push down to which tier. Vitess's planner (Source: sources/2026-04-21-planetscale-what-is-a-query-planner) searches for the plan that minimises the number of network calls to MySQL shards — single-node cost metrics don't apply.

Cost models

Planners fall into two broad families by how they compare alternatives:

  • Rule-based. Apply fixed rewrite rules in a fixed order; whichever plan survives the rule sequence wins. Cheaper to implement, deterministic, but misses opportunities a better-ordered rule set would find. Early MySQL planners + Vitess Gen4 are closer to this end of the spectrum.
  • Cost-based. Enumerate candidate plans (or a subset via pruning), score each by a cost function estimated from table statistics, return the cheapest. PostgreSQL, SQL Server, Oracle, DB2, Calcite, Cascades, Volcano all use cost-based planning with varying search strategies. Catastrophic failure mode: bad statistics → bad cost estimate → bad plan chosen; the whole query runs slowly because the planner picked a plan that looked cheap but wasn't.

Most production planners blend both — cost-based enumeration with rule-based rewrites applied as pre-processing and post-processing (see concepts/planner-phase-ordering).

Why planning is hard

  • Search space explosion. 10-way joins have 10! ≈ 3.6M orderings; even with pruning, exhaustive enumeration is infeasible beyond ~15 tables. System-R's dynamic programming is O(3^N); heuristic strategies are cheaper but miss plans.
  • Cost-model error compounds. An estimated cardinality of 100 rows that is actually 1M rows propagates through the cost estimate and can pick a plan that's 10,000× worse than optimal.
  • Correlated predicates. WHERE state='CA' AND city='San Francisco' — the naive assumption that the two predicates are independent is wildly wrong. Modern planners use column-group statistics, extended statistics, or sampling to catch correlation.
  • Phase-ordering bugs. Two individually-correct rewrite rules can interact destructively; see the Vitess aggregation-pushdown OOM incident for a canonical production failure.

The distributed-query shift

Single-node planners optimise CPU + I/O. Distributed planners (VTGate, Presto, Trino, Spark SQL, Snowflake, BigQuery) optimise network too — each cross-node hop costs milliseconds versus microseconds for local operations. The optimisation priority inverts: a plan with more total CPU work but fewer network hops usually wins in a cross-shard regime.

Taylor's canonical Vitess framing (Source: sources/2026-04-21-planetscale-what-is-a-query-planner): "If we can perform a join or a filter in MySQL, that is always going to be faster than fetching all the individual rows and performing the same operation on the VTGate side. So, during query planning, we are searching for the query plan that has the least number of network calls."

Seen in

Last updated · 470 distilled / 1,213 read