Skip to content

SYSTEM Cited by 2 sources

InnoDB

InnoDB is the default storage engine of MySQL — and, by extension, of every MySQL-fork including PlanetScale's Vitess-based MySQL product. It provides the on-disk format, in-memory cache (buffer pool), transactional semantics (ACID + MVCC), row-level locking, and crash recovery for MySQL.

PlanetScale's Ben Dicken calls it "arguably the world's most popular database management system['s] most commonly used engine" and notes that it "relies heavily" on B+trees — every table, every index. (Source: .)

Architectural shape

  • Every table is a B+tree. The table's primary key is the root of a B+tree whose leaves contain the full row data. This is a clustered index — the table and its primary-key index are the same structure. Narrow tables pack hundreds of rows per leaf; wide tables may fit single-digit rows.
  • Every secondary index is a separate B+tree. Keyed on the indexed column(s), with the primary key (not a row pointer) in the leaves. Query execution therefore does two B+tree walks on secondary-index queries: secondary → primary key → clustered index → row. See concepts/secondary-index.
  • 16-KB page size default. The B+tree node size = the disk I/O unit = the buffer-pool cache unit. Every operation reads/writes whole 16-KB pages. See concepts/disk-block-size-alignment.
  • Buffer pool is the load-bearing cache. All page access goes through the buffer pool — an in-memory LRU-variant cache of hot pages. "The buffer pool drastically helps query performance. Without it, we'd end up doing significantly more disk I/O operations to handle a query workload."
  • Adaptive Hash Index — runtime-adaptive hash overlay on top of the buffer pool. For index values observed being looked up repeatedly, InnoDB materialises a hash entry whose value is a direct pointer into the target buffer-pool page, short-circuiting the O(log N) B+tree descent. InnoDB does not support on-disk HASH indexes at all — the AHI is the in-memory-only workaround. "If you try to create an index with USING HASH on an InnoDB-powered table, MySQL will instead create a B-tree index." Benchmarked 16–20% speed-up on repeated-lookup workloads at 390M rows. Controlled via innodb_adaptive_hash_index (default on). See patterns/runtime-adaptive-in-memory-index.

Primary-key choice is load-bearing

Because the table is the primary-key B+tree, the PK choice determines the on-disk layout of every row:

  • Sequential PK (BIGINT AUTO_INCREMENT, timestamp-ordered ID, UUIDv7) → inserts follow the right-most path, leaves grow on the right, range scans are sequential. See patterns/sequential-primary-key.
  • Random PK (UUIDv4 / v3 / v5) → inserts hit unpredictable nodes, write amplification via scattered splits, range scans fan out across non-adjacent leaves. Additionally, InnoDB fills pages to only ~50% of page size on random-key inserts vs ~94% on sequential — doubling total page count and buffer-pool pressure. See concepts/uuid-primary-key-antipattern and concepts/innodb-page-fill-factor. MySQL 8.0+ ships UUID_TO_BIN(@uuid, 1) with a swap flag that reorders UUIDv1 bytes to recover sequential locality — see concepts/uuid-to-bin-swap-flag.

The PlanetScale post makes this the central architectural argument: "Your choice of PRIMARY KEY will impact the layout on disk of all of the data in the table, and in turn performance. Choose your PRIMARY KEY wisely!"

Transactional / durability mechanisms

Not covered in the PlanetScale B-tree post, but foundational to InnoDB:

  • MVCC — row versions identified by transaction ID; readers don't block writers, writers don't block readers.
  • Row-level locking — via hidden next-key / gap locks on index rows; coarser than document-level but finer than page-level.
  • Redo log (ib_logfile*) — WAL for crash recovery.
  • Undo log — per-transaction rollback + MVCC version storage.
  • Doublewrite buffer — protects against torn-page writes on mid-flush crashes.
  • Change buffer — deferred secondary-index writes to batch I/O.
  • Adaptive Hash Index — runtime-adaptive O(1) hash overlay on top of the buffer pool for frequently looked-up index values. Read-side counterpart to the Change buffer. See patterns/runtime-adaptive-in-memory-index.

Known sibling engines

Engine DB Tree Clustered? Node size
InnoDB MySQL / MariaDB / Vitess B+tree Yes 16 KB
Postgres heap + B-tree Postgres B-tree No (heap-organised) 8 KB
WiredTiger MongoDB B+tree Yes (per collection) 4 KB
SQLite B-tree SQLite B-tree Yes 4 KB (configurable)
BoltDB (various Go apps) B+tree Yes 4 KB

Relationships with other wiki systems

Seen in

  • — Brian Morrison II (PlanetScale, 2024-03-19) canonicalises the 94% vs 50% InnoDB page-fill asymmetry — InnoDB's insert path assumes sequential PKs and fills pages to ~94% on that path; random PKs collapse page fill to ~50%, doubling total page count. Also canonicalises the UUID_TO_BIN(@u, 1) swap flag as MySQL 8.0+'s native helper for generating COMB-UUIDs from server-side UUIDv1 values. See concepts/innodb-page-fill-factor and concepts/uuid-to-bin-swap-flag.
  • sources/2026-04-21-planetscale-the-mysql-adaptive-hash-index — Ben Dicken (PlanetScale, 2024-04-24) canonicalises InnoDB's Adaptive Hash Index (AHI) as an in-memory hash overlay on top of the buffer pool. Three InnoDB-specific disclosures canonical to this page: (1) InnoDB does not support on-disk HASH indexesCREATE INDEX ... USING HASH is silently substituted to BTREE + warning; the AHI is the in-memory-only workaround that recovers hash-lookup performance for the hot subset of keys. (2) AHI pointers only point into buffer-pool-resident pages — page eviction invalidates corresponding AHI entries, so AHI is fundamentally a cache on top of a cache. (3) AHI usage is auto-throttled by InnoDB based on observed buffer-pool behaviour, and exposed via SHOW ENGINE INNODB STATUS \G; under the INSERT BUFFER AND ADAPTIVE HASH INDEX section. Benchmarks at 390M rows + 4-level deep username B+tree: +16% QPS on single-value repeated lookups (14,044 → 16,701 QPS), +20% on 1000-value hot-set random lookups (9,232 → 11,562 QPS). Canonical scaling claim: "workloads using deeper B-tree indexes may see even more performance improvement" — overlay benefit scales with underlying tree depth. Complements systems/innodb's existing canonicalisation of B+tree architecture, clustered-PK layout, secondary-index structure, and buffer-pool as the load-bearing cache; this source adds the runtime-adaptive overlay layer as a first-class architectural component. See patterns/runtime-adaptive-in-memory-index for the general pattern.

  • — Brian Morrison II (PlanetScale, 2024-01-08) canonicalises InnoDB as the implementer of MySQL's transaction- isolation-level semantics. All four levels — READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ (MySQL default), SERIALIZABLE — are implemented on top of InnoDB's MVCC row- versioning machinery plus its shared/exclusive row-lock and gap-lock primitives. Canonical MVCC-scope-per-level disclosure: READ COMMITTED creates a fresh snapshot per SELECT ("Each select within that transaction will create and use a fresh snapshot"); REPEATABLE READ pins to a single transaction ID for the lifetime of the transaction ("a row may have multiple versions at any given time, depending on open transactions. In repeatable read level, queries will use a consistent snapshot of the data, pinning the reads to a single transaction ID throughout the transaction."). InnoDB's gap-locking mechanism is what makes MySQL's REPEATABLE READ phantom-read-free on locking reads — a MySQL-specific strengthening of the SQL- standard isolation-level definitions. Critical InnoDB lock-scope subtlety canonicalised: locking reads with no usable index cause InnoDB to lock every row scanned (regardless of whether it matches WHERE) plus gap-lock to prevent phantom-affecting inserts. Also canonicalises SERIALIZABLE as a lock-based (S2PL-variant) strengthening over REPEATABLE READ — implicit S lock on every SELECT with elevated deadlock risk. Canonical substrate for every FOR SHARE / FOR UPDATE locking read in the wiki — the Consistent Lookup Vindex's three-connection DML protocol, the slotted-counter pattern's contention analysis, and the database-as-final-arbiter-of-uniqueness pattern all ride on this same substrate. Pedagogy-voice Tier-3 post.

  • — Aaron Francis canonicalises the generated-hash-column pattern on top of InnoDB's B+tree secondary-index machinery. The pattern exists because InnoDB B+tree index nodes have a fixed per-node byte budget (16-KB page default) and TEXT / BLOB columns can only carry prefix indexes — a BINARY(16) MD5 hash column sidesteps both constraints by being compact and fixed-width. UNIQUE INDEX on a composite hash column (patterns/composite-hash-uniqueness-constraint) is enforced by InnoDB as the commit-time serialisation point for multi-column uniqueness.

  • — canonical introduction to InnoDB's B+tree architecture, clustered-primary-key layout, secondary-index structure, buffer pool, and the primary-key-choice → physical-layout consequences.
  • — canonical wiki instance of a vector ANN index hosted inside InnoDB as a first-class durable structure. PlanetScale extends SPFresh with transactional semantics on all operations and integrates it inside InnoDB: vector mutations ride the SQL commit path, the index pages live in InnoDB's page store, the buffer pool caches them, the redo log covers index changes, and crash recovery covers the vector index alongside row data. Terabyte-scale vector indexes "just like any other MySQL table." First wiki disclosure of InnoDB hosting a non-B+tree ANN structure as a native index type.
  • GA announcement (2026-03-25) discloses the integration mechanism. SPANN posting lists are "stored in hidden InnoDB tables" — the vector index rides existing InnoDB table machinery (transactional semantics, buffer- pool caching, redo log, crash recovery) rather than a bespoke page type. The in-memory centroid tree is ~20% of total index size; only the tree + a bounded number of hot partitions need buffer-pool residency. This is the how of the beta post's architectural claim.
  • — canonical wiki instance of InnoDB row-level lock contention as a production failure mode. UPDATE counters SET count = count + 1 WHERE id = 1 under bursty parallel writes serialises every writer on the same lock_mode X locks rec but not gap — surfaced by SHOW ENGINE INNODB STATUS\G as LOCK WAIT 2 lock struct(s) + TRX HAS BEEN WAITING N SEC FOR THIS LOCK TO BE GRANTED. Sam Lambert's 2022 post (republished Feb 2025) articulates the hot-row problem that InnoDB's row-level locking exposes for counter workloads and presents the slotted counter pattern as the canonical MySQL-specific workaround. GitHub's github.downloads table is the canonical deployment — the diagnostic output in the article explicitly names index PRIMARY of table github.downloads. Complements the 2024-09-09 B-tree post (which framed InnoDB's physical-layout trade-offs) with the concurrency-control trade-offs axis.

  • — canonical wiki instance of InnoDB's row-level X locks composed across three independent MySQL connections to implement Vitess's Consistent Lookup Vindex. Every DML uses SELECT ... FOR UPDATE on both the authoritative user-table row and the dependent lookup-Vindex row, with locks held across connection boundaries until ordered commit (PreMainPost). The identity-update no-op optimisation documented in the post exists precisely because InnoDB row-locks are connection-scoped and strict: without the no-op, a DML's Post-connection delete and Pre-connection re-insert would each hold an X lock on the same lookup-table row, and the two would deadlock against each other despite being part of the same logical transaction. First wiki datum on how a sharding layer composes over InnoDB's row-level locking to engineer around the cross-shard transaction cost.

  • canonical wiki disclosure of InnoDB's silent-cascade- in-binlog property as the load-bearing architectural limitation that forced Vitess to own foreign-key logic above the storage engine. Shlomi Noach + Manan Gupta (2023-12-05) canonicalise: (i) foreign keys are an InnoDB extension, not a MySQL-server-layer feature — MySQL AB's server-level FK effort was dropped after InnoDB became default; (ii) InnoDB applies cascaded ON DELETE CASCADE / ON UPDATE CASCADE / ON DELETE SET NULL / ON UPDATE SET NULL child-row changes internally, "never logged to the binary log" — CDC + replica-with- FK-stripped topologies miss the events; (iii) the InnoDB codebase is "very much detached from the MySQL codebase" — an InnoDB-level fix for (ii) is high-risk. The load-bearing framing is that FK enforcement is a storage-engine property that doesn't compose cleanly with server-layer consumers (binlog, Vitess's shadow-table Online DDL, cross-shard proxies). Two MySQL-fork patches ship to bound the InnoDB / shadow-table interaction surface: internal operations tables (per-table FK-exempt marker) + rename_table_preserve_foreign_key (by-name FK pinning on RENAME). First wiki citation of planetscale/mysql-server as a first-class PlanetScale asset distinct from the Vitess forks.

  • — Rafer Hazen (2024-08-14) canonicalises InnoDB's storage-handler index_init() callback as the capture point for Insights' per-query index-usage telemetry. Verbatim: "The InnoDB storage handler includes an index-initialization function that MySQL calls (once) prior to using an index in a query. By recording the index name passed to this function in a per query data structure, we're able to find the set of all indexes used by each query." Two InnoDB-specific disclosures canonical to this page: (1) index_init() is a handler-API callback invoked once per (query, index) pair before any row reads on that index — a stable, narrow-waist hook point that every InnoDB query flows through. (2) InnoDB is the exclusive storage engine across every PlanetScale branch ("PlanetScale databases exclusively use the InnoDB storage engine") — which is what lets PlanetScale treat the InnoDB-specific hook as sufficient fleet coverage. Canonicalised as InnoDB index_init hook and composed with handler-hook sidecar telemetry (hook captures → datum rides existing wire packet → VTGate aggregates). The SELECT-only caveat — InnoDB-level coverage of UPDATE / DELETE index walks is absent from this telemetry path — is canonical to concepts/select-only-index-telemetry-caveat. This source extends InnoDB's Seen-in with a new axis: InnoDB's handler API as a per-query telemetry substrate, distinct from its role as the B+tree storage layer, AHI overlay layer, or buffer-pool cache layer canonicalised by the neighbouring Dicken posts.

Last updated · 542 distilled / 1,571 read