Skip to content

CONCEPT Cited by 2 sources

Clustered index

A clustered index is a database index in which the table itself is the index: rows live inside the index structure keyed on the primary key, in primary-key order. The index root is the table's root; the index leaves contain the full rows.

The canonical wiki instance is MySQL's InnoDB storage engine, where every table is a B+tree keyed on the primary key and the leaves contain "the remaining column values for each row … stored only in the leaf nodes." (Source: sources/2024-09-09-planetscale-b-trees-and-database-indexes.)

Contrast: heap-organised tables + separate PK index

Most databases (Postgres, Oracle by default, SQL Server with non-clustered default) use heap-organised tables — rows live in an unordered heap, and the primary-key index is a separate B+tree pointing into the heap. InnoDB's clustered shape differs: primary-key lookup hits the leaf and already has the row, no second fetch required.

Property Clustered (InnoDB) Heap + PK index (Postgres)
Table storage Inside primary-key B+tree Heap + separate index
PK lookup I/O One B+tree walk Index walk + heap fetch
Row order on disk Sorted by PK Insertion order
PK choice affects row layout Yes — sharply No
Secondary index leaf value Primary key Heap tuple ID
Range scan by PK Sequential via leaf linked list Random across heap

Why it makes primary-key choice load-bearing

Because rows sit in primary-key order inside the B+tree, the choice of primary key determines the physical on-disk layout of the whole table. Three consequences the PlanetScale post makes explicit:

  1. Insert path is a function of the key value. A sequential PK hits the right-most path always; a random PK hits unpredictable paths. See patterns/sequential-primary-key and concepts/uuid-primary-key-antipattern.
  2. Row data in leaves means leaves are as wide as the row. "The number of rows stored in each node depends on how 'wide' the table is." A narrow table packs hundreds of rows per 16-KB leaf; a wide table fits single-digit rows.
  3. Range queries on the PK scan contiguous leaves. A time-range query on a sequential PK reads neighbouring leaves via the B+tree leaf linked list — sequential I/O. On a random PK, matching rows scatter across non-adjacent leaves — random I/O even for a semantically sequential query.

Secondary index interaction

A secondary index on a clustered-index table is itself a B+tree keyed on the indexed column with primary-key values in its leaves. Query execution therefore does two B+tree walks:

  1. Walk the secondary index to find the primary key.
  2. Walk the clustered index (the table) with that primary key to fetch the row.

This is why SELECT username FROM user WHERE email = '...' on a table with an email_index does "a lookup on the email_index B+tree. After it has found the associated user_id value it will use that to perform another lookup into the primary key B+tree." (Source: sources/2024-09-09-planetscale-b-trees-and-database-indexes.)

A covering index — a secondary index that includes all projected columns — avoids the second walk by packing the answer into the secondary-index leaves.

Clustered-index design rules

  • Prefer a sequential primary key (AUTO_INCREMENT / monotonic / timestamp-prefixed) to concentrate inserts on the right edge of the tree. See patterns/sequential-primary-key.
  • Prefer a small primary key. Every secondary-index leaf stores the full PK, so a 16-byte UUID vs an 8-byte BIGINT doubles secondary-index size. Smaller PKs → more keys per internal node → shallower tree.
  • Declare a primary key. InnoDB requires one; if you don't declare one, it picks the first non-null unique index or generates a hidden 6-byte rowid — both uncontrolled.
  • Narrow the row width. The row lives in the leaf; wider rows mean fewer rows per leaf and more leaf pages.

Seen in

  • sources/2024-09-09-planetscale-b-trees-and-database-indexes — canonical introduction to InnoDB's clustered B+tree layout. The "on-disk layout of all the data in the table" is determined by primary-key choice.
  • sources/2026-02-16-planetscale-zero-downtime-migrations-at-petabyte-scale — canonical wiki instance of the clustered index being exploited for bulk-read throughput during a migration. Vitess VReplication's row- copy phase reads rows "ordering the results by the PRIMARY KEY (PK) columns in the table … so that we can read from the clustered index immediately as we are then reading the records in order and do not need to formulate the entire result set and order it with a filesort before we can start streaming rows." When the table's physical layout is the primary-key order, streaming billions of rows ordered by PK is sequential I/O rather than a materialise-and-sort — the clustered-index shape directly determines whether petabyte-scale migrations are feasible. First canonical wiki instance of clustered- index architecture as a migration-throughput property, not just a query-performance property.
  • systems/innodb — the canonical engine.
Last updated · 319 distilled / 1,201 read