CONCEPT Cited by 3 sources
Composite-index column order¶
The order in which columns appear in a composite index definition determines which queries can use the index and how efficiently — and the right order is a function of the query's predicates, the column selectivity for those predicates on this data distribution, and the leftmost-prefix rule. Rules of thumb that generalise across tables are wrong as often as they are right; composite-index column ordering is a workload-dependent design decision.
The PlanetScale backup case¶
The canonical illustration is David Graham's backup-table case study:
Using the query statistics report we discovered this query takes an average of 719ms to find expired backups to delete. … We think this query should be faster because we have a composite index on
(expires_at, data_deleted_at, deleted_at). However, the MySQL explain plan shows that the query planner chooses a full table scan over every row rather than using this index.— sources/2021-08-31-planetscale-optimizing-sql-with-query-statistics
The diagnosis: the leading column was the least selective predicate, not the most.
- 97 % of rows matched
expires_at <= CURRENT_TIMESTAMP→ the leading column narrows nothing. - 97.5 % of rows had
deleted_at IS NOT NULL→ filtering fordeleted_at IS NULLwas highly selective.
Reordering to (deleted_at, expires_at, data_deleted_at)
turned a full table scan (type: ALL, hundreds of
thousands of rows visited) into a single-row range scan
(type: range, rows: 1, filtered: 100.00). Average
runtime dropped from 719 ms to under 20 ms — a 98 %
reduction.
Why column order matters¶
A composite index on (A, B, C) is a single
B+tree sorted lexicographically —
first by A, then within each A-value by B, then
within each (A, B)-pair by C. This gives the index:
- Fast equality / range lookup on
A(the whole tree is keyed byAat the top level). - Fast
A = ? AND B = ?(once you've found theA-subtree, it's sorted byB). - Fast
A = ? AND B = ? AND C = ?(same, one more level).
But:
- Filtering on
Balone requires scanning everyA-value's subtree — useless. The leftmost-prefix rule means the optimiser won't even consider the index for this query. - Filtering on
Awith a wide range (e.g. 97 % of rows match) narrows the set to 97 % of the table — not enough to make index-based access cheaper than a full scan, so the optimiser rejects the index.
So the leading column should be (in priority order):
- A column the query filters on with an equality or narrow range (otherwise you can't enter the index at all).
- Among qualifying columns, the most selective for this query's predicates — i.e. the one that eliminates the largest fraction of rows.
The workload-dependent surprise¶
David's own framing makes the lesson precise:
We often include
deleted_atas a trailing key in composite indexes for this reason, but this isn't quite right for the backup table.— sources/2021-08-31-planetscale-optimizing-sql-with-query-statistics
The general convention at PlanetScale — "most rows are
live, so deleted_at IS NULL matches most of the table
and belongs at the back of the index" — is inverted
for the backup table. Daily backups replace their
predecessors, so 97.5 % of rows are soft-deleted and
deleted_at IS NULL is the rare case. The column
ordering that is correct for most tables is exactly wrong
for this one. See also
skewed column
selectivity for why selectivity is always
predicate- and workload-dependent.
Mechanical rule¶
For a query WHERE A = x AND B = y AND C > z ORDER BY D,
a good starting composite index is:
- Equality predicates first —
A,Bin any order that matches their selectivity for the query. Equality predicates can each narrow the lookup by one key-level. - One range predicate last among key-cols —
C(the>predicate). Only the first range predicate uses the B+tree; subsequent range predicates become post-filter work. - Then order-by columns (if including them avoids a
filesort). For PlanetScale's backup case, leavingORDER BY idout of the index leavesExtra: Using filesortin the plan — benign here because the result set is tiny. - Then covered output columns (if you want a covering index that avoids the clustered-index lookup).
This yields the textbook mnemonic: equality, then range, then order, then cover.
Verification¶
The only reliable way to validate a column-ordering decision is EXPLAIN:
type: ALLwithkey: NULL→ the optimiser rejected the index. Column order is probably wrong.type: rangeortype: refwith your index inkey→ the optimiser is using the index as intended.rowsestimate close to the actual result-set size → the index is narrowing the search efficiently.filtered: 100.00→ every row the index returns is in the result set (no post-filter).
See patterns/explain-driven-index-verification for the full discipline.
Not just MySQL¶
The same reasoning applies to composite indexes in PostgreSQL and every other relational database using B-tree indexes. The specific leftmost-prefix consequences and the optimiser's selectivity heuristics differ in detail, but the underlying rule — the order of columns in the index key is the order in which they narrow the lookup — is universal across B-tree indexes.
Trade-offs¶
- More indexes vs. better-ordered indexes. If two different queries want different leading columns, you may need two indexes — but each index has a write cost (index write cost), so at scale you want the fewest that cover your patterns.
- Index column order affects sort. An index on
(A, B)can serveORDER BY A, B ASCfor free, but notORDER BY B, AorORDER BY A DESC, B ASC. - Ordering affects
key_lenof multi-column hits. MySQL'skey_leninEXPLAINtells you how many leading columns of a composite index the query is actually using — a quick audit for whether the column order matches the query.
Seen in¶
- sources/2021-08-31-planetscale-optimizing-sql-with-query-statistics
— The canonical PlanetScale case study: reordering
(expires_at, data_deleted_at, deleted_at)→(deleted_at, expires_at, data_deleted_at)produced a 98 % latency reduction on a backup-deletion query. - sources/2026-04-21-planetscale-why-isnt-mysql-using-my-index — Aaron Francis's general explanation of why MySQL rejects indexes, covering the selectivity and leftmost-prefix consequences of column order.
- sources/2026-04-21-planetscale-how-do-database-indexes-work — Justin Gage's intro to composite-index mental models.
Related¶
- concepts/composite-index — the data structure this concept constrains.
- concepts/leftmost-prefix-rule — why the leading column is privileged.
- concepts/index-selectivity — the numerical measure that drives ordering.
- concepts/skewed-column-selectivity — why the same column is selective for one predicate and not another.
- concepts/mysql-explain — the tool to verify column-ordering choices.
- patterns/explain-driven-index-verification — the workflow for validating ordering.
- patterns/composite-index-for-and-predicate — the use-case composite indexes serve.
- systems/planetscale-insights — surfaces column-ordering opportunities via AI-powered index suggestions.