CONCEPT Cited by 1 source
B+tree¶
A B+tree is a B-tree variant with two structural modifications that make it dominate database indexes over the plain B-tree:
- Key/value pairs are stored only at leaf nodes. Internal nodes store only keys and child pointers.
- Leaf nodes form a doubly-linked list. Each leaf carries next/previous pointers to its sibling leaves, so in-order traversal reads the values sequentially without ascending back into internal nodes.
(Source: .)
Why these changes matter¶
- More keys per internal node → shallower tree. With no value payload in internal nodes, internal-node bytes are spent entirely on keys + child pointers. Given a fixed node size (disk block), the fan-out is higher, so the tree depth for the same row count is lower, so lookups touch fewer pages.
- Range scans become sequential. A query like
WHERE k BETWEEN a AND b ORDER BY kdescends once to the leaf containinga, then walks the linked-list of leaves untilb. No repeated descents. This is why time-range queries on a sequential primary key are effectively sequential I/O.
MySQL / InnoDB specifics¶
InnoDB's B+tree implementation further specialises:
- Non-leaf nodes store N child pointers instead of N+1. The classical B+tree has one more child than keys at each node; InnoDB's variant has one per key.
- All levels are doubly-linked lists. Next/previous pointers exist not just on leaves but on internal nodes too, so traversal can go sideways at any level.
- Node size = 16 KB by default — one InnoDB page.
- Default behaviour is clustered — the table's primary key IS the root of a B+tree whose leaves contain the entire row. See concepts/clustered-index.
B-tree vs B+tree summary¶
| Property | B-tree | B+tree |
|---|---|---|
| Where values live | Any node | Leaves only |
| Internal-node key density | Medium | High |
| Tree depth (same row count) | Deeper | Shallower |
| Range scan | Repeated descents | Linked-list walk on leaves |
| Sibling traversal | No | Yes (linked list) |
| Typical DB usage | Rare | Dominant (MySQL, Postgres, MongoDB, SQLite) |
Why databases prefer B+trees¶
Ben Dicken summarises two operational reasons the B+tree wins:
Since inner nodes do not have to store values, we can fit more keys per inner node! This can help keep the tree shallower.
All of the values can be stored at the same level, and traversed in-order via the bottom-level linked list.
—
Seen in¶
- — Brian Morrison II (PlanetScale, 2024-03-19) canonicalises
InnoDB's page-fill factor asymmetry on B+tree leaves:
"InnoDB assumes that the primary key will increment
predictably… If true, InnoDB will fill the pages to about
94% of the page size before creating a new page. When the
primary key is random, the amount of space utilized from
each page can be as low as 50%." Specific numbers for
the B+tree insert-path property the Dicken 2024-09-09 post
canonicalises structurally. See
concepts/innodb-page-fill-factor. The same post
canonicalises the key-width fan-out impact with concrete
ratios: 4 bytes / 8 bytes / 16 bytes / 36 bytes for
INT/BIGINT/BINARY(16)UUID /CHAR(36)UUID — every halving of key width increases fan-out ~50% and decreases tree depth by one per factor-of-K in row count. - sources/2026-04-21-planetscale-the-mysql-adaptive-hash-index
— Ben Dicken (PlanetScale, 2024-04-24) canonicalises the
Adaptive Hash Index as
a runtime-adaptive O(1) overlay on top of B+tree
lookups. The B+tree is still the authoritative
structure; for index values InnoDB observes being looked
up repeatedly, an in-memory hash entry caches a pointer
directly into the buffer-pool page holding the target row,
short-circuiting the O(log N) tree descent. First wiki
disclosure of the general claim that "B-tree index
lookups are already very fast, but we have layered an
additional optimization on top to make it even faster" —
hash-overlay-on-B+tree, not hash-instead-of-B+tree
(InnoDB doesn't support on-disk
HASHindexes at all;CREATE INDEX ... USING HASHsilently emits aBTREEindex + warning). Benchmark at 390M rows + 4-level deepusernameB+tree: +16% QPS on single-value repeated lookups, +20% on 1000-value hot-set random lookups. Canonical scaling claim: "workloads using deeper B-tree indexes may see even more performance improvement" — overlay benefit scales with underlying tree depth. See patterns/runtime-adaptive-in-memory-index. - — Aaron Francis's hash-column post canonicalises the
index-compactness lever: a compact fixed-width index
key (
BINARY(16)MD5 hash) produces a narrower + shallower B+tree than a wide composite or variable-length index key. Narrow keys → higher fan-out → fewer I/Os per lookup → smaller hot working set in the buffer pool. The same B+tree structural property drives theTEXT/BLOBprefix-index requirement — the fixed per-node byte budget (16-KB InnoDB pages) makes unbounded variable-length values unindexable without a prefix, for which hashing is the canonical workaround. - — B+tree properties, MySQL/InnoDB specialisation, range-scan via leaf linked list. The article's central argument — that primary-key choice changes the tree's insert shape — is about the B+tree specifically.
- systems/innodb — all InnoDB table + index structures are B+trees.
- systems/wiredtiger — MongoDB's default storage engine, also B+tree-based.