Skip to content

CONCEPT Cited by 1 source

B+tree

A B+tree is a B-tree variant with two structural modifications that make it dominate database indexes over the plain B-tree:

  1. Key/value pairs are stored only at leaf nodes. Internal nodes store only keys and child pointers.
  2. Leaf nodes form a doubly-linked list. Each leaf carries next/previous pointers to its sibling leaves, so in-order traversal reads the values sequentially without ascending back into internal nodes.

(Source: .)

Why these changes matter

  • More keys per internal node → shallower tree. With no value payload in internal nodes, internal-node bytes are spent entirely on keys + child pointers. Given a fixed node size (disk block), the fan-out is higher, so the tree depth for the same row count is lower, so lookups touch fewer pages.
  • Range scans become sequential. A query like WHERE k BETWEEN a AND b ORDER BY k descends once to the leaf containing a, then walks the linked-list of leaves until b. No repeated descents. This is why time-range queries on a sequential primary key are effectively sequential I/O.

MySQL / InnoDB specifics

InnoDB's B+tree implementation further specialises:

  • Non-leaf nodes store N child pointers instead of N+1. The classical B+tree has one more child than keys at each node; InnoDB's variant has one per key.
  • All levels are doubly-linked lists. Next/previous pointers exist not just on leaves but on internal nodes too, so traversal can go sideways at any level.
  • Node size = 16 KB by default — one InnoDB page.
  • Default behaviour is clustered — the table's primary key IS the root of a B+tree whose leaves contain the entire row. See concepts/clustered-index.

B-tree vs B+tree summary

Property B-tree B+tree
Where values live Any node Leaves only
Internal-node key density Medium High
Tree depth (same row count) Deeper Shallower
Range scan Repeated descents Linked-list walk on leaves
Sibling traversal No Yes (linked list)
Typical DB usage Rare Dominant (MySQL, Postgres, MongoDB, SQLite)

Why databases prefer B+trees

Ben Dicken summarises two operational reasons the B+tree wins:

Since inner nodes do not have to store values, we can fit more keys per inner node! This can help keep the tree shallower.

All of the values can be stored at the same level, and traversed in-order via the bottom-level linked list.

Seen in

  • — Brian Morrison II (PlanetScale, 2024-03-19) canonicalises InnoDB's page-fill factor asymmetry on B+tree leaves: "InnoDB assumes that the primary key will increment predictably… If true, InnoDB will fill the pages to about 94% of the page size before creating a new page. When the primary key is random, the amount of space utilized from each page can be as low as 50%." Specific numbers for the B+tree insert-path property the Dicken 2024-09-09 post canonicalises structurally. See concepts/innodb-page-fill-factor. The same post canonicalises the key-width fan-out impact with concrete ratios: 4 bytes / 8 bytes / 16 bytes / 36 bytes for INT / BIGINT / BINARY(16) UUID / CHAR(36) UUID — every halving of key width increases fan-out ~50% and decreases tree depth by one per factor-of-K in row count.
  • sources/2026-04-21-planetscale-the-mysql-adaptive-hash-index — Ben Dicken (PlanetScale, 2024-04-24) canonicalises the Adaptive Hash Index as a runtime-adaptive O(1) overlay on top of B+tree lookups. The B+tree is still the authoritative structure; for index values InnoDB observes being looked up repeatedly, an in-memory hash entry caches a pointer directly into the buffer-pool page holding the target row, short-circuiting the O(log N) tree descent. First wiki disclosure of the general claim that "B-tree index lookups are already very fast, but we have layered an additional optimization on top to make it even faster"hash-overlay-on-B+tree, not hash-instead-of-B+tree (InnoDB doesn't support on-disk HASH indexes at all; CREATE INDEX ... USING HASH silently emits a BTREE index + warning). Benchmark at 390M rows + 4-level deep username B+tree: +16% QPS on single-value repeated lookups, +20% on 1000-value hot-set random lookups. Canonical scaling claim: "workloads using deeper B-tree indexes may see even more performance improvement" — overlay benefit scales with underlying tree depth. See patterns/runtime-adaptive-in-memory-index.
  • — Aaron Francis's hash-column post canonicalises the index-compactness lever: a compact fixed-width index key (BINARY(16) MD5 hash) produces a narrower + shallower B+tree than a wide composite or variable-length index key. Narrow keys → higher fan-out → fewer I/Os per lookup → smaller hot working set in the buffer pool. The same B+tree structural property drives the TEXT / BLOB prefix-index requirement — the fixed per-node byte budget (16-KB InnoDB pages) makes unbounded variable-length values unindexable without a prefix, for which hashing is the canonical workaround.
  • — B+tree properties, MySQL/InnoDB specialisation, range-scan via leaf linked list. The article's central argument — that primary-key choice changes the tree's insert shape — is about the B+tree specifically.
  • systems/innodb — all InnoDB table + index structures are B+trees.
  • systems/wiredtiger — MongoDB's default storage engine, also B+tree-based.
Last updated · 542 distilled / 1,571 read