Skip to content

SYSTEM Cited by 4 sources

InnoDB

InnoDB is the default storage engine of MySQL — and, by extension, of every MySQL-fork including PlanetScale's Vitess-based MySQL product. It provides the on-disk format, in-memory cache (buffer pool), transactional semantics (ACID + MVCC), row-level locking, and crash recovery for MySQL.

PlanetScale's Ben Dicken calls it "arguably the world's most popular database management system['s] most commonly used engine" and notes that it "relies heavily" on B+trees — every table, every index. (Source: sources/2024-09-09-planetscale-b-trees-and-database-indexes.)

Architectural shape

  • Every table is a B+tree. The table's primary key is the root of a B+tree whose leaves contain the full row data. This is a clustered index — the table and its primary-key index are the same structure. Narrow tables pack hundreds of rows per leaf; wide tables may fit single-digit rows.
  • Every secondary index is a separate B+tree. Keyed on the indexed column(s), with the primary key (not a row pointer) in the leaves. Query execution therefore does two B+tree walks on secondary-index queries: secondary → primary key → clustered index → row. See concepts/secondary-index.
  • 16-KB page size default. The B+tree node size = the disk I/O unit = the buffer-pool cache unit. Every operation reads/writes whole 16-KB pages. See concepts/disk-block-size-alignment.
  • Buffer pool is the load-bearing cache. All page access goes through the buffer pool — an in-memory LRU-variant cache of hot pages. "The buffer pool drastically helps query performance. Without it, we'd end up doing significantly more disk I/O operations to handle a query workload."

Primary-key choice is load-bearing

Because the table is the primary-key B+tree, the PK choice determines the on-disk layout of every row:

  • Sequential PK (BIGINT AUTO_INCREMENT, timestamp-ordered ID, UUIDv7) → inserts follow the right-most path, leaves grow on the right, range scans are sequential. See patterns/sequential-primary-key.
  • Random PK (UUIDv4 / v3 / v5) → inserts hit unpredictable nodes, write amplification via scattered splits, range scans fan out across non-adjacent leaves. See concepts/uuid-primary-key-antipattern.

The PlanetScale post makes this the central architectural argument: "Your choice of PRIMARY KEY will impact the layout on disk of all of the data in the table, and in turn performance. Choose your PRIMARY KEY wisely!"

Transactional / durability mechanisms

Not covered in the PlanetScale B-tree post, but foundational to InnoDB:

  • MVCC — row versions identified by transaction ID; readers don't block writers, writers don't block readers.
  • Row-level locking — via hidden next-key / gap locks on index rows; coarser than document-level but finer than page-level.
  • Redo log (ib_logfile*) — WAL for crash recovery.
  • Undo log — per-transaction rollback + MVCC version storage.
  • Doublewrite buffer — protects against torn-page writes on mid-flush crashes.
  • Change buffer — deferred secondary-index writes to batch I/O.

Known sibling engines

Engine DB Tree Clustered? Node size
InnoDB MySQL / MariaDB / Vitess B+tree Yes 16 KB
Postgres heap + B-tree Postgres B-tree No (heap-organised) 8 KB
WiredTiger MongoDB B+tree Yes (per collection) 4 KB
SQLite B-tree SQLite B-tree Yes 4 KB (configurable)
BoltDB (various Go apps) B+tree Yes 4 KB

Relationships with other wiki systems

Seen in

  • sources/2024-09-09-planetscale-b-trees-and-database-indexes — canonical introduction to InnoDB's B+tree architecture, clustered-primary-key layout, secondary-index structure, buffer pool, and the primary-key-choice → physical-layout consequences.
  • sources/2024-10-22-planetscale-planetscale-vectors-public-beta — canonical wiki instance of a vector ANN index hosted inside InnoDB as a first-class durable structure. PlanetScale extends SPFresh with transactional semantics on all operations and integrates it inside InnoDB: vector mutations ride the SQL commit path, the index pages live in InnoDB's page store, the buffer pool caches them, the redo log covers index changes, and crash recovery covers the vector index alongside row data. Terabyte-scale vector indexes "just like any other MySQL table." First wiki disclosure of InnoDB hosting a non-B+tree ANN structure as a native index type.
  • sources/2025-02-04-planetscale-the-slotted-counter-pattern — canonical wiki instance of InnoDB row-level lock contention as a production failure mode. UPDATE counters SET count = count + 1 WHERE id = 1 under bursty parallel writes serialises every writer on the same lock_mode X locks rec but not gap — surfaced by SHOW ENGINE INNODB STATUS\G as LOCK WAIT 2 lock struct(s) + TRX HAS BEEN WAITING N SEC FOR THIS LOCK TO BE GRANTED. Sam Lambert's 2022 post (republished Feb 2025) articulates the hot-row problem that InnoDB's row-level locking exposes for counter workloads and presents the slotted counter pattern as the canonical MySQL-specific workaround. GitHub's github.downloads table is the canonical deployment — the diagnostic output in the article explicitly names index PRIMARY of table github.downloads. Complements the 2024-09-09 B-tree post (which framed InnoDB's physical-layout trade-offs) with the concurrency-control trade-offs axis.

  • sources/2026-04-21-planetscale-achieving-data-consistency-with-the-consistent-lookup-vindex — canonical wiki instance of InnoDB's row-level X locks composed across three independent MySQL connections to implement Vitess's Consistent Lookup Vindex. Every DML uses SELECT ... FOR UPDATE on both the authoritative user-table row and the dependent lookup-Vindex row, with locks held across connection boundaries until ordered commit (PreMainPost). The identity-update no-op optimisation documented in the post exists precisely because InnoDB row-locks are connection-scoped and strict: without the no-op, a DML's Post-connection delete and Pre-connection re-insert would each hold an X lock on the same lookup-table row, and the two would deadlock against each other despite being part of the same logical transaction. First wiki datum on how a sharding layer composes over InnoDB's row-level locking to engineer around the cross-shard transaction cost.

Last updated · 319 distilled / 1,201 read