SYSTEM Cited by 2 sources
InnoDB¶
InnoDB is the default storage engine of MySQL — and, by extension, of every MySQL-fork including PlanetScale's Vitess-based MySQL product. It provides the on-disk format, in-memory cache (buffer pool), transactional semantics (ACID + MVCC), row-level locking, and crash recovery for MySQL.
PlanetScale's Ben Dicken calls it "arguably the world's most popular database management system['s] most commonly used engine" and notes that it "relies heavily" on B+trees — every table, every index. (Source: .)
Architectural shape¶
- Every table is a B+tree. The table's primary key is the root of a B+tree whose leaves contain the full row data. This is a clustered index — the table and its primary-key index are the same structure. Narrow tables pack hundreds of rows per leaf; wide tables may fit single-digit rows.
- Every secondary index is a separate B+tree. Keyed on the indexed column(s), with the primary key (not a row pointer) in the leaves. Query execution therefore does two B+tree walks on secondary-index queries: secondary → primary key → clustered index → row. See concepts/secondary-index.
- 16-KB page size default. The B+tree node size = the disk I/O unit = the buffer-pool cache unit. Every operation reads/writes whole 16-KB pages. See concepts/disk-block-size-alignment.
- Buffer pool is the load-bearing cache. All page access goes through the buffer pool — an in-memory LRU-variant cache of hot pages. "The buffer pool drastically helps query performance. Without it, we'd end up doing significantly more disk I/O operations to handle a query workload."
- Adaptive Hash Index —
runtime-adaptive hash overlay on top of the buffer pool.
For index values observed being looked up repeatedly,
InnoDB materialises a hash entry whose value is a direct
pointer into the target buffer-pool page, short-circuiting
the O(log N) B+tree descent. InnoDB does not support
on-disk
HASHindexes at all — the AHI is the in-memory-only workaround. "If you try to create an index withUSING HASHon an InnoDB-powered table, MySQL will instead create a B-tree index." Benchmarked 16–20% speed-up on repeated-lookup workloads at 390M rows. Controlled viainnodb_adaptive_hash_index(default on). See patterns/runtime-adaptive-in-memory-index.
Primary-key choice is load-bearing¶
Because the table is the primary-key B+tree, the PK choice determines the on-disk layout of every row:
- Sequential PK (
BIGINT AUTO_INCREMENT, timestamp-ordered ID, UUIDv7) → inserts follow the right-most path, leaves grow on the right, range scans are sequential. See patterns/sequential-primary-key. - Random PK (UUIDv4 / v3 / v5) → inserts hit unpredictable
nodes, write amplification via scattered splits, range
scans fan out across non-adjacent leaves. Additionally,
InnoDB fills pages to only ~50% of page size on
random-key inserts vs ~94% on sequential — doubling
total page count and buffer-pool pressure. See
concepts/uuid-primary-key-antipattern and
concepts/innodb-page-fill-factor. MySQL 8.0+ ships
UUID_TO_BIN(@uuid, 1)with a swap flag that reorders UUIDv1 bytes to recover sequential locality — see concepts/uuid-to-bin-swap-flag.
The PlanetScale post makes this the central architectural
argument: "Your choice of PRIMARY KEY will impact the
layout on disk of all of the data in the table, and in turn
performance. Choose your PRIMARY KEY wisely!"
Transactional / durability mechanisms¶
Not covered in the PlanetScale B-tree post, but foundational to InnoDB:
- MVCC — row versions identified by transaction ID; readers don't block writers, writers don't block readers.
- Row-level locking — via hidden next-key / gap locks on index rows; coarser than document-level but finer than page-level.
- Redo log (
ib_logfile*) — WAL for crash recovery. - Undo log — per-transaction rollback + MVCC version storage.
- Doublewrite buffer — protects against torn-page writes on mid-flush crashes.
- Change buffer — deferred secondary-index writes to batch I/O.
- Adaptive Hash Index — runtime-adaptive O(1) hash overlay on top of the buffer pool for frequently looked-up index values. Read-side counterpart to the Change buffer. See patterns/runtime-adaptive-in-memory-index.
Known sibling engines¶
| Engine | DB | Tree | Clustered? | Node size |
|---|---|---|---|---|
| InnoDB | MySQL / MariaDB / Vitess | B+tree | Yes | 16 KB |
| Postgres heap + B-tree | Postgres | B-tree | No (heap-organised) | 8 KB |
| WiredTiger | MongoDB | B+tree | Yes (per collection) | 4 KB |
| SQLite B-tree | SQLite | B-tree | Yes | 4 KB (configurable) |
| BoltDB | (various Go apps) | B+tree | Yes | 4 KB |
Relationships with other wiki systems¶
- systems/mysql — InnoDB's host database.
- systems/vitess — sharding layer over MySQL / InnoDB; underlies PlanetScale MySQL.
- systems/wiredtiger — MongoDB's storage engine; also B+tree clustered; sibling architecture on a different database.
- concepts/innodb-buffer-pool — the cache layer.
- concepts/clustered-index — the defining layout choice.
Seen in¶
- — Brian Morrison II (PlanetScale, 2024-03-19) canonicalises
the 94% vs 50% InnoDB page-fill asymmetry — InnoDB's
insert path assumes sequential PKs and fills pages to ~94%
on that path; random PKs collapse page fill to ~50%,
doubling total page count. Also canonicalises the
UUID_TO_BIN(@u, 1)swap flag as MySQL 8.0+'s native helper for generating COMB-UUIDs from server-side UUIDv1 values. See concepts/innodb-page-fill-factor and concepts/uuid-to-bin-swap-flag. -
sources/2026-04-21-planetscale-the-mysql-adaptive-hash-index — Ben Dicken (PlanetScale, 2024-04-24) canonicalises InnoDB's Adaptive Hash Index (AHI) as an in-memory hash overlay on top of the buffer pool. Three InnoDB-specific disclosures canonical to this page: (1) InnoDB does not support on-disk
HASHindexes —CREATE INDEX ... USING HASHis silently substituted toBTREE+ warning; the AHI is the in-memory-only workaround that recovers hash-lookup performance for the hot subset of keys. (2) AHI pointers only point into buffer-pool-resident pages — page eviction invalidates corresponding AHI entries, so AHI is fundamentally a cache on top of a cache. (3) AHI usage is auto-throttled by InnoDB based on observed buffer-pool behaviour, and exposed viaSHOW ENGINE INNODB STATUS \G;under theINSERT BUFFER AND ADAPTIVE HASH INDEXsection. Benchmarks at 390M rows + 4-level deepusernameB+tree: +16% QPS on single-value repeated lookups (14,044 → 16,701 QPS), +20% on 1000-value hot-set random lookups (9,232 → 11,562 QPS). Canonical scaling claim: "workloads using deeper B-tree indexes may see even more performance improvement" — overlay benefit scales with underlying tree depth. Complements systems/innodb's existing canonicalisation of B+tree architecture, clustered-PK layout, secondary-index structure, and buffer-pool as the load-bearing cache; this source adds the runtime-adaptive overlay layer as a first-class architectural component. See patterns/runtime-adaptive-in-memory-index for the general pattern. -
— Brian Morrison II (PlanetScale, 2024-01-08) canonicalises InnoDB as the implementer of MySQL's transaction- isolation-level semantics. All four levels —
READ UNCOMMITTED,READ COMMITTED,REPEATABLE READ(MySQL default),SERIALIZABLE— are implemented on top of InnoDB's MVCC row- versioning machinery plus its shared/exclusive row-lock and gap-lock primitives. Canonical MVCC-scope-per-level disclosure:READ COMMITTEDcreates a fresh snapshot perSELECT("Eachselectwithin that transaction will create and use a fresh snapshot");REPEATABLE READpins to a single transaction ID for the lifetime of the transaction ("a row may have multiple versions at any given time, depending on open transactions. In repeatable read level, queries will use a consistent snapshot of the data, pinning the reads to a single transaction ID throughout the transaction."). InnoDB's gap-locking mechanism is what makes MySQL'sREPEATABLE READphantom-read-free on locking reads — a MySQL-specific strengthening of the SQL- standard isolation-level definitions. Critical InnoDB lock-scope subtlety canonicalised: locking reads with no usable index cause InnoDB to lock every row scanned (regardless of whether it matchesWHERE) plus gap-lock to prevent phantom-affecting inserts. Also canonicalisesSERIALIZABLEas a lock-based (S2PL-variant) strengthening overREPEATABLE READ— implicitSlock on everySELECTwith elevated deadlock risk. Canonical substrate for everyFOR SHARE/FOR UPDATElocking read in the wiki — the Consistent Lookup Vindex's three-connection DML protocol, the slotted-counter pattern's contention analysis, and the database-as-final-arbiter-of-uniqueness pattern all ride on this same substrate. Pedagogy-voice Tier-3 post. -
— Aaron Francis canonicalises the generated-hash-column pattern on top of InnoDB's B+tree secondary-index machinery. The pattern exists because InnoDB B+tree index nodes have a fixed per-node byte budget (16-KB page default) and
TEXT/BLOBcolumns can only carry prefix indexes — aBINARY(16)MD5 hash column sidesteps both constraints by being compact and fixed-width.UNIQUE INDEXon a composite hash column (patterns/composite-hash-uniqueness-constraint) is enforced by InnoDB as the commit-time serialisation point for multi-column uniqueness. - — canonical introduction to InnoDB's B+tree architecture, clustered-primary-key layout, secondary-index structure, buffer pool, and the primary-key-choice → physical-layout consequences.
- — canonical wiki instance of a vector ANN index hosted inside InnoDB as a first-class durable structure. PlanetScale extends SPFresh with transactional semantics on all operations and integrates it inside InnoDB: vector mutations ride the SQL commit path, the index pages live in InnoDB's page store, the buffer pool caches them, the redo log covers index changes, and crash recovery covers the vector index alongside row data. Terabyte-scale vector indexes "just like any other MySQL table." First wiki disclosure of InnoDB hosting a non-B+tree ANN structure as a native index type.
- — GA announcement (2026-03-25) discloses the integration mechanism. SPANN posting lists are "stored in hidden InnoDB tables" — the vector index rides existing InnoDB table machinery (transactional semantics, buffer- pool caching, redo log, crash recovery) rather than a bespoke page type. The in-memory centroid tree is ~20% of total index size; only the tree + a bounded number of hot partitions need buffer-pool residency. This is the how of the beta post's architectural claim.
-
— canonical wiki instance of InnoDB row-level lock contention as a production failure mode.
UPDATE counters SET count = count + 1 WHERE id = 1under bursty parallel writes serialises every writer on the samelock_mode X locks rec but not gap— surfaced bySHOW ENGINE INNODB STATUS\GasLOCK WAIT 2 lock struct(s)+TRX HAS BEEN WAITING N SEC FOR THIS LOCK TO BE GRANTED. Sam Lambert's 2022 post (republished Feb 2025) articulates the hot-row problem that InnoDB's row-level locking exposes for counter workloads and presents the slotted counter pattern as the canonical MySQL-specific workaround. GitHub'sgithub.downloadstable is the canonical deployment — the diagnostic output in the article explicitly namesindex PRIMARY of table github.downloads. Complements the 2024-09-09 B-tree post (which framed InnoDB's physical-layout trade-offs) with the concurrency-control trade-offs axis. -
— canonical wiki instance of InnoDB's row-level
Xlocks composed across three independent MySQL connections to implement Vitess's Consistent Lookup Vindex. Every DML usesSELECT ... FOR UPDATEon both the authoritativeuser-table row and the dependent lookup-Vindex row, with locks held across connection boundaries until ordered commit (Pre→Main→Post). The identity-update no-op optimisation documented in the post exists precisely because InnoDB row-locks are connection-scoped and strict: without the no-op, a DML'sPost-connection delete andPre-connection re-insert would each hold anXlock on the same lookup-table row, and the two would deadlock against each other despite being part of the same logical transaction. First wiki datum on how a sharding layer composes over InnoDB's row-level locking to engineer around the cross-shard transaction cost. -
— canonical wiki disclosure of InnoDB's silent-cascade- in-binlog property as the load-bearing architectural limitation that forced Vitess to own foreign-key logic above the storage engine. Shlomi Noach + Manan Gupta (2023-12-05) canonicalise: (i) foreign keys are an InnoDB extension, not a MySQL-server-layer feature — MySQL AB's server-level FK effort was dropped after InnoDB became default; (ii) InnoDB applies cascaded
ON DELETE CASCADE/ON UPDATE CASCADE/ON DELETE SET NULL/ON UPDATE SET NULLchild-row changes internally, "never logged to the binary log" — CDC + replica-with- FK-stripped topologies miss the events; (iii) the InnoDB codebase is "very much detached from the MySQL codebase" — an InnoDB-level fix for (ii) is high-risk. The load-bearing framing is that FK enforcement is a storage-engine property that doesn't compose cleanly with server-layer consumers (binlog, Vitess's shadow-table Online DDL, cross-shard proxies). Two MySQL-fork patches ship to bound the InnoDB / shadow-table interaction surface: internal operations tables (per-table FK-exempt marker) +rename_table_preserve_foreign_key(by-name FK pinning on RENAME). First wiki citation ofplanetscale/mysql-serveras a first-class PlanetScale asset distinct from the Vitess forks. -
— Rafer Hazen (2024-08-14) canonicalises InnoDB's storage-handler
index_init()callback as the capture point for Insights' per-query index-usage telemetry. Verbatim: "The InnoDB storage handler includes an index-initialization function that MySQL calls (once) prior to using an index in a query. By recording the index name passed to this function in a per query data structure, we're able to find the set of all indexes used by each query." Two InnoDB-specific disclosures canonical to this page: (1)index_init()is a handler-API callback invoked once per (query, index) pair before any row reads on that index — a stable, narrow-waist hook point that every InnoDB query flows through. (2) InnoDB is the exclusive storage engine across every PlanetScale branch ("PlanetScale databases exclusively use the InnoDB storage engine") — which is what lets PlanetScale treat the InnoDB-specific hook as sufficient fleet coverage. Canonicalised as InnoDBindex_inithook and composed with handler-hook sidecar telemetry (hook captures → datum rides existing wire packet → VTGate aggregates). TheSELECT-only caveat — InnoDB-level coverage ofUPDATE/DELETEindex walks is absent from this telemetry path — is canonical to concepts/select-only-index-telemetry-caveat. This source extends InnoDB's Seen-in with a new axis: InnoDB's handler API as a per-query telemetry substrate, distinct from its role as the B+tree storage layer, AHI overlay layer, or buffer-pool cache layer canonicalised by the neighbouring Dicken posts.
Related¶
- concepts/b-plus-tree
- concepts/b-tree
- concepts/clustered-index
- concepts/secondary-index
- concepts/innodb-buffer-pool
- concepts/adaptive-hash-index
- concepts/innodb-page-fill-factor
- patterns/sequential-primary-key
- concepts/uuid-primary-key-antipattern
- concepts/uuid-to-bin-swap-flag
- concepts/disk-block-size-alignment
- concepts/row-level-lock-contention
- concepts/hot-row-problem
- concepts/vindex
- concepts/consistent-lookup-vindex
- concepts/keyspace-id
- concepts/orphan-lookup-row
- patterns/slotted-counter-pattern
- patterns/ordered-commit-without-2pc
- systems/mysql
- systems/vitess
- systems/planetscale
- systems/wiredtiger
- companies/planetscale
- concepts/foreign-key-constraint
- concepts/innodb-silent-cascade-in-binlog
- concepts/innodb-internal-operations-table
- concepts/innodb-parent-table-rename-pinning
- concepts/cyclic-foreign-key-prohibition
- systems/planetscale-mysql-server-fork
- patterns/application-level-cascade-orchestration
- patterns/nowait-lock-for-cascade-select