Skip to content

CONCEPT

UUID primary-key antipattern

Using a random UUID (v4, v3, v5) as a clustered-index primary key in InnoDB (or any database where the table is organised as a primary-key B+tree) penalises performance on three orthogonal axes. The antipattern and its mechanism are laid out in detail by PlanetScale's Ben Dicken. (Source: .)

The three failure axes

1. Unpredictable insert path → cache miss on every insert

The nodes visited for an insert are unpredictable ahead of time.

Random keys hit random positions in the tree. Over sustained writes, the set of pages touched approaches the full tree. The InnoDB buffer pool can't hold the entire tree once the table exceeds RAM, so every insert may miss the cache and fault a page from disk. This collapses insert throughput onto disk-seek latency.

2. Unpredictable destination leaf → write amplification via splits

The destination leaf node for an insert is unpredictable.

Node splits happen anywhere in the tree rather than concentrated at the right edge. Each split writes two pages (or more, if the split cascades up through internal nodes). With random inserts, splits are distributed across the tree and write amplification multiplies.

3. Out-of-order leaf values → range scans fan out

The values in the leaves are not in order.

A time-range query — the most common shape in operational workloads ("consider viewing the timeline on X, or a chat history in Slack") — with a random PK fans out across non-adjacent leaves even though the user query looks sequential. The B+tree leaf linked list doesn't help: the matching rows aren't neighbours.

Secondary amplification: larger keys shrink fan-out

A UUID is 16 bytes; a BIGINT is 8 bytes. With a fixed node size:

If our tree node is only 100 bytes, child pointers are 8 bytes, and values are 8 bytes. We could fit 4 UUIDs (plus 4 child pointers) in each node. … If we had used a BIGINT instead, we could fit 6 keys (and corresponding child pointers) in each node instead. This would lead to a shallower tree, better for performance.

Halving key size increases fan-out ~50%, which decreases tree depth by one for every factor of K in row count. For the same number of rows, a UUID-keyed tree is taller than a BIGINT-keyed tree — every lookup costs an extra page read.

Additionally, every secondary index leaf stores the primary key. Five secondary indexes × 8 extra bytes × N rows adds meaningful storage overhead on top of the main table.

Storage width: 4× to 9× a 32-bit integer

Canonicalised by

as the storage-axis amplifier on top of the three tree- shape failure axes:

By default, an auto-incrementing integer will consume 32 bits of storage per value. Compare this with UUIDs. If stored in a compact binary form, a single UUID would consume 128 bits on disk. Already, that is 4× the consumption of a 32-bit integer. If instead you choose to use a more human readable string-based representation, each UUID could be stored as a CHAR(36), consuming a whopping 288 bits per UUID.

Representation Bytes Bits Ratio to INT
INT 4 32
BIGINT 8 64
BINARY(16) (UUID compact) 16 128
CHAR(36) (UUID string) 36 288

The CHAR(36) path is the default in most ORMs and is ~2.25× wider than the BINARY(16) path. Always prefer BINARY(16) when using UUIDs on MySQL — see concepts/uuid-string-representation-mysql.

Page-fill collapse from 94% to ~50%

Morrison 2024-03-19 additionally canonicalises the InnoDB page fill-factor asymmetry:

InnoDB assumes that the primary key will increment predictably, either numerically or lexicographically. If true, InnoDB will fill the pages to about 94% of the page size before creating a new page. When the primary key is random, the amount of space utilized from each page can be as low as 50%.

Near-doubling of total page count on top of the 4–9× key- width amplification. See concepts/innodb-page-fill-factor for the mechanism and implications.

UUIDv7 sidesteps most of it

UUIDv7 is time-ordered — the first 48 bits are a millisecond timestamp, followed by random bits for uniqueness within a millisecond. Inserts land in ascending order like a sequential integer; leaves grow on the right side of the tree; range scans on creation time are sequential. The article names this as the UUID escape hatch but defers details to a separate PlanetScale post (the-problem-with-using-a-uuid-primary-key-in-mysql).

UUIDv1 is also time-ordered in theory, but its field layout puts the low-order timestamp bits before the high-order ones, so the natural byte order isn't sort order — it needs a COMB-UUID / reordered-UUID transform to recover sequentiality. MySQL 8.0+ ships this transform as a built-in helper — UUID_TO_BIN(@uuid, 1) with the swap flag — see concepts/uuid-to-bin-swap-flag.

Legitimate reasons to pick UUID anyway

The antipattern is real but not universal. UUIDs have properties that can outweigh the B+tree penalty in specific cases:

  • Client-side generation — no round-trip to the database to get a new ID; safe to create aggregates offline or on different nodes without coordination.
  • Global uniqueness across shards — a sequential BIGINT collides across shards unless coordinated (e.g. Snowflake IDs carry shard ID in the high bits). UUIDs don't.
  • Security by obscurity for IDs in URLs/user/12345 leaks your user count; /user/550e8400-e29b-... doesn't.
  • Merge-friendly — no ID collisions when merging data from two separate trees.

For those cases, UUIDv7 is the modern compromise: global uniqueness + client-side generation + sequential-like B+tree locality. See concepts/uuid-version-taxonomy for the full v1–v8 layout. Non-UUID alternatives — Snowflake IDs (64-bit BIGINT-fitting), ULIDs (128-bit base32 string), NanoIDs (URL-safe random; PlanetScale's choice for their API) — cover most real-world motivations for UUIDs with different trade-offs.

Mitigations when stuck with random UUID

  • Don't use UUID as primary key; use it as a unique secondary index. Use an internal BIGINT AUTO_INCREMENT as the clustered PK, expose the UUID externally through a unique index. Keeps the table's physical layout sequential.
  • Store UUID as BINARY(16) not CHAR(36). Character encoding balloons the 128-bit value to 36 bytes — >2× storage, >2× B+tree penalty.
  • Batch inserts by UUID prefix if you must use UUID as PK — group by high bits so each batch hits one subtree, at least amortising the cache churn.

Seen in

  • — Mike Coutermarsh (PlanetScale, 2022-03-29) shows the concrete realisation of this concept's "don't use UUID as primary key; use it as a unique secondary index" mitigation: BIGINT AUTO_INCREMENT PRIMARY KEY + public_id VARCHAR(12) UNIQUE KEY where public_id is a 12-char base-36 NanoID. The pattern is the public-id-alongside-BIGINT-PK pattern canonicalised on PlanetScale's own API. The 2024-03-19 Morrison UUID-PK post forward-references this 2022 Coutermarsh post when naming NanoIDs as an alternative.
  • — Brian Morrison II (PlanetScale, 2024-03-19) canonicalises the UUID v1–v8 taxonomy + the two storage-axis amplifiers the 2024-09 Dicken post defers to: (1) the 9× / 4× / 2× / 1× key-width ratios between CHAR(36), BINARY(16), BIGINT, INT; (2) the 94% vs 50% InnoDB page fill-factor asymmetry on sequential vs random keys, doubling total page count on top of the key-width amplification. Also canonicalises MySQL's UUID_TO_BIN(@u, 1) swap flag as the server-side COMB-UUID helper for v1 values, and names Snowflake IDs, ULIDs, and NanoIDs as non-UUID alternatives (with PlanetScale's own API using NanoIDs).
  • — canonical introduction to the antipattern with the three failure axes + worked fan-out comparison + time-range example.
Last updated · 542 distilled / 1,571 read