CONCEPT

Composite index¶

A composite index (a.k.a. multi-column index, compound index) is a single database index whose key is a tuple of two or more columns in a specified order. In InnoDB it's a B+tree keyed on (col_A, col_B, col_C, …) where the ordering inside the tree is lexicographic — first by col_A, then within each col_A value by col_B, and so on.

CREATE INDEX fullnames ON employees(last_name, first_name);

A composite index is what lets the database engine efficiently answer queries that filter on multiple columns jointly — e.g. WHERE last_name = 'Puppo' AND first_name = 'Kendra'.

The phone-book-inside-a-phone-book mental model¶

The PlanetScale post frames this vividly:

You can imagine this as a phone book placed inside another. First, you look up the last name Puppo, leading you to the second catalog for all the people named Kendra, organized alphabetically by first names, which you can use to find Kendra quickly.

—

The index physically sorts rows by last_name first; within each last_name group, rows are sorted by first_name. A point lookup for ('Puppo', 'Kendra') is a single B+tree descent.

Why two separate indexes don't work¶

The naïve alternative to a composite index is two single-column indexes, one on each filter column. The post is explicit this usually underperforms a composite index for AND-predicates:

If you create two separate indexes in this way, MySQL knows how to find all employees named Puppo. It also knows how to find all employees named Kendra. However, it doesn't know how to find people named Kendra Puppo.

Two separate indexes force MySQL into one of three options:

Use one index, filter the rest. Walk the last_name index to find all Puppos, then post-filter each match for first_name = 'Kendra'. Wastes I/O on every non-Kendra Puppo.
Use Index Merge (see type: index_merge) — walk both indexes, intersect the primary-key sets. Works but has coordination overhead and typically loses to a dedicated composite index.
Ignore both indexes — do a full table scan if the planner decides that's cheaper than either above.

The post calls this out: "MySQL may decide not to use multiple indexes; even if it does, in many scenarios, they won't serve the purpose as well as a dedicated index."

The leftmost-prefix rule¶

A composite index on (A, B, C) can serve queries that filter on:

A alone
A AND B
A AND B AND C

But it cannot serve queries that filter on:

B alone (missing leading column)
C alone
B AND C (missing leading column)

This is the leftmost-prefix rule — the index can be used only if the query's equality predicates cover a contiguous prefix of the index columns starting from the first. The phone-book analogy makes the rule obvious: you can't find "Kendra" in a book sorted primarily by last name without scanning every last-name section.

Verifying usage via EXPLAIN¶

The canonical verification is EXPLAIN's key_len column:

key_len ≈ length of all indexed columns → full composite index is in use.
key_len ≈ length of leading column only → only the prefix is in use (the trailing predicates aren't binding the index).

Combined with type: ref and rows: 1 (vs type: ALL and rows: N), this is the diagnostic signature of a composite index fulfilling its purpose. See patterns/composite-index-for-and-predicate for the full before/after pattern.

Canonical worked example¶

From (MySQL Employees sample database):

SELECT * FROM employees
WHERE last_name = 'Puppo' AND first_name = 'Kendra';

Before CREATE INDEX: type: ALL, rows: 299,202 (full scan of the employees table).

After CREATE INDEX fullnames ON employees(last_name, first_name): type: ref, rows: 1, key: fullnames.

The result is the same single row — but MySQL went from touching 299,202 rows to touching 1.

Design considerations¶

Column order matters — put the highest-selectivity / most-common-equality column first if the workload has queries that use just the leading column; otherwise put the column that's always in the WHERE first.
Range predicates break the prefix — a query like WHERE A = 1 AND B > 10 AND C = 5 uses the (A, B) prefix; the C predicate becomes a post-filter because the range on B breaks the equality chain.
Write amplification stacks — see secondary-index write amplification. A composite index is still one index with one B+tree insert path, so it's strictly cheaper than two separate single-column indexes on write — and usually strictly better on read too.
Hashing as a compact alternative — for purely equality-based AND-predicates where you don't need range or prefix capability, a generated-hash-column with a UNIQUE INDEX collapses the composite index to a single 16-byte fixed-width key. See patterns/composite-hash-uniqueness-constraint.

Seen in¶

— Canonical wiki case study on composite-index column order. David Graham (PlanetScale, 2021-08-31) walks through a production backup-deletion query that ran at 719 ms average because its composite index (expires_at, data_deleted_at, deleted_at) had the least-selective column first (97 % of rows matched expires_at <= NOW(); only 2.5 % matched deleted_at IS NULL). The planner rejected the index (type: ALL, full scan) and the query was slow. Reordering to (deleted_at, expires_at, data_deleted_at) — putting the most selective column first — dropped the query to under 20 ms (98 % reduction). Canonical source for composite-index column order and the general lesson that composite-index column ordering is workload-dependent.
— Aaron Francis (PlanetScale, 2023-05-04) canonicalises the leftmost-prefix rule with a worked multi (first_name, state) example: the index answers WHERE first_name = 'Aaron' AND state = 'TX' (two-column equality → ref, 178 rows) and WHERE first_name = 'Aaron' alone (leading column → ref, key_len 202), but does not answer WHERE state = 'TX' alone (skips leading column → possible_keys: NULL, full table scan 493,889 rows). Canonical rule statement verbatim: "MySQL will only be able to use the columns starting on the left and working toward the right. It cannot skip any columns."
— canonical wiki source: Savannah Longoria canonicalises the phone-book mental model, the two-separate-indexes anti-pattern, the CREATE INDEX fullnames ON employees(last_name, first_name) syntax, and the rows: 299,202 → rows: 1 before/after verification via EXPLAIN.
— Justin Gage's 2022 pedagogical piece frames indexes generally as "a way of letting your table be sorted by multiple columns" — the composite index is the literal instantiation of that framing.
— Aaron Francis's post canonicalises the hash variant — a single BINARY(16) MD5 index replacing a composite index as the multi-column uniqueness primitive when only equality (not range/sort) is needed.