CONCEPT
UUID primary-key antipattern¶
Using a random UUID (v4, v3, v5) as a clustered-index primary key in InnoDB (or any database where the table is organised as a primary-key B+tree) penalises performance on three orthogonal axes. The antipattern and its mechanism are laid out in detail by PlanetScale's Ben Dicken. (Source: .)
The three failure axes¶
1. Unpredictable insert path → cache miss on every insert¶
The nodes visited for an insert are unpredictable ahead of time.
—
Random keys hit random positions in the tree. Over sustained writes, the set of pages touched approaches the full tree. The InnoDB buffer pool can't hold the entire tree once the table exceeds RAM, so every insert may miss the cache and fault a page from disk. This collapses insert throughput onto disk-seek latency.
2. Unpredictable destination leaf → write amplification via splits¶
The destination leaf node for an insert is unpredictable.
Node splits happen anywhere in the tree rather than concentrated at the right edge. Each split writes two pages (or more, if the split cascades up through internal nodes). With random inserts, splits are distributed across the tree and write amplification multiplies.
3. Out-of-order leaf values → range scans fan out¶
The values in the leaves are not in order.
A time-range query — the most common shape in operational workloads ("consider viewing the timeline on X, or a chat history in Slack") — with a random PK fans out across non-adjacent leaves even though the user query looks sequential. The B+tree leaf linked list doesn't help: the matching rows aren't neighbours.
Secondary amplification: larger keys shrink fan-out¶
A UUID is 16 bytes; a BIGINT is 8 bytes. With a fixed node
size:
If our tree node is only 100 bytes, child pointers are 8 bytes, and values are 8 bytes. We could fit 4 UUIDs (plus 4 child pointers) in each node. … If we had used a
BIGINTinstead, we could fit 6 keys (and corresponding child pointers) in each node instead. This would lead to a shallower tree, better for performance.
Halving key size increases fan-out ~50%, which decreases tree depth by one for every factor of K in row count. For the same number of rows, a UUID-keyed tree is taller than a BIGINT-keyed tree — every lookup costs an extra page read.
Additionally, every secondary index leaf stores the primary key. Five secondary indexes × 8 extra bytes × N rows adds meaningful storage overhead on top of the main table.
Storage width: 4× to 9× a 32-bit integer¶
Canonicalised by
as the storage-axis amplifier on top of the three tree- shape failure axes:
By default, an auto-incrementing integer will consume 32 bits of storage per value. Compare this with UUIDs. If stored in a compact binary form, a single UUID would consume 128 bits on disk. Already, that is 4× the consumption of a 32-bit integer. If instead you choose to use a more human readable string-based representation, each UUID could be stored as a
CHAR(36), consuming a whopping 288 bits per UUID.
| Representation | Bytes | Bits | Ratio to INT |
|---|---|---|---|
INT |
4 | 32 | 1× |
BIGINT |
8 | 64 | 2× |
BINARY(16) (UUID compact) |
16 | 128 | 4× |
CHAR(36) (UUID string) |
36 | 288 | 9× |
The CHAR(36) path is the default in most ORMs and is
~2.25× wider than the BINARY(16) path. Always prefer
BINARY(16) when using UUIDs on MySQL — see
concepts/uuid-string-representation-mysql.
Page-fill collapse from 94% to ~50%¶
Morrison 2024-03-19 additionally canonicalises the InnoDB page fill-factor asymmetry:
InnoDB assumes that the primary key will increment predictably, either numerically or lexicographically. If true, InnoDB will fill the pages to about 94% of the page size before creating a new page. When the primary key is random, the amount of space utilized from each page can be as low as 50%.
—
Near-doubling of total page count on top of the 4–9× key- width amplification. See concepts/innodb-page-fill-factor for the mechanism and implications.
UUIDv7 sidesteps most of it¶
UUIDv7 is time-ordered — the first 48 bits are a millisecond timestamp, followed by random bits for uniqueness within a millisecond. Inserts land in ascending order like a sequential integer; leaves grow on the right side of the tree; range scans on creation time are sequential. The article names this as the UUID escape hatch but defers details to a separate PlanetScale post (the-problem-with-using-a-uuid-primary-key-in-mysql).
UUIDv1 is also time-ordered in theory, but its field layout
puts the low-order timestamp bits before the high-order
ones, so the natural byte order isn't sort order — it needs
a COMB-UUID / reordered-UUID transform to recover
sequentiality. MySQL 8.0+ ships this transform as a built-in
helper — UUID_TO_BIN(@uuid, 1) with the swap flag — see
concepts/uuid-to-bin-swap-flag.
Legitimate reasons to pick UUID anyway¶
The antipattern is real but not universal. UUIDs have properties that can outweigh the B+tree penalty in specific cases:
- Client-side generation — no round-trip to the database to get a new ID; safe to create aggregates offline or on different nodes without coordination.
- Global uniqueness across shards — a sequential
BIGINTcollides across shards unless coordinated (e.g. Snowflake IDs carry shard ID in the high bits). UUIDs don't. - Security by obscurity for IDs in URLs —
/user/12345leaks your user count;/user/550e8400-e29b-...doesn't. - Merge-friendly — no ID collisions when merging data from two separate trees.
For those cases, UUIDv7 is the modern compromise: global
uniqueness + client-side generation + sequential-like
B+tree locality. See concepts/uuid-version-taxonomy for
the full v1–v8 layout. Non-UUID alternatives —
Snowflake IDs (64-bit
BIGINT-fitting), ULIDs
(128-bit base32 string),
NanoIDs (URL-safe random; PlanetScale's choice for their
API) — cover most real-world motivations for UUIDs with
different trade-offs.
Mitigations when stuck with random UUID¶
- Don't use UUID as primary key; use it as a unique
secondary index. Use an internal
BIGINT AUTO_INCREMENTas the clustered PK, expose the UUID externally through a unique index. Keeps the table's physical layout sequential. - Store UUID as
BINARY(16)notCHAR(36). Character encoding balloons the 128-bit value to 36 bytes — >2× storage, >2× B+tree penalty. - Batch inserts by UUID prefix if you must use UUID as PK — group by high bits so each batch hits one subtree, at least amortising the cache churn.
Seen in¶
- — Mike Coutermarsh (PlanetScale, 2022-03-29) shows the
concrete realisation of this concept's "don't use UUID
as primary key; use it as a unique secondary index"
mitigation:
BIGINT AUTO_INCREMENT PRIMARY KEY+public_id VARCHAR(12) UNIQUE KEYwherepublic_idis a 12-char base-36 NanoID. The pattern is the public-id-alongside-BIGINT-PK pattern canonicalised on PlanetScale's own API. The 2024-03-19 Morrison UUID-PK post forward-references this 2022 Coutermarsh post when naming NanoIDs as an alternative. - — Brian Morrison II (PlanetScale, 2024-03-19) canonicalises
the UUID v1–v8 taxonomy + the two storage-axis
amplifiers the 2024-09 Dicken post defers to: (1) the
9× / 4× / 2× / 1× key-width ratios between
CHAR(36),BINARY(16),BIGINT,INT; (2) the 94% vs 50% InnoDB page fill-factor asymmetry on sequential vs random keys, doubling total page count on top of the key-width amplification. Also canonicalises MySQL'sUUID_TO_BIN(@u, 1)swap flag as the server-side COMB-UUID helper for v1 values, and names Snowflake IDs, ULIDs, and NanoIDs as non-UUID alternatives (with PlanetScale's own API using NanoIDs). - — canonical introduction to the antipattern with the three failure axes + worked fan-out comparison + time-range example.
Related¶
- concepts/clustered-index
- concepts/b-plus-tree
- concepts/innodb-buffer-pool
- concepts/innodb-page-fill-factor
- concepts/uuid-version-taxonomy
- concepts/uuid-to-bin-swap-flag
- concepts/uuid-string-representation-mysql
- concepts/snowflake-id
- concepts/ulid-identifier
- concepts/nanoid-identifier
- concepts/public-id-column
- concepts/secondary-index
- patterns/sequential-primary-key
- patterns/public-id-alongside-bigint-pk
- systems/innodb
- systems/mysql