Skip to content

CONCEPT Cited by 1 source

UUID primary-key antipattern

Using a random UUID (v4, v3, v5) as a clustered-index primary key in InnoDB (or any database where the table is organised as a primary-key B+tree) penalises performance on three orthogonal axes. The antipattern and its mechanism are laid out in detail by PlanetScale's Ben Dicken. (Source: sources/2024-09-09-planetscale-b-trees-and-database-indexes.)

The three failure axes

1. Unpredictable insert path → cache miss on every insert

The nodes visited for an insert are unpredictable ahead of time.

sources/2024-09-09-planetscale-b-trees-and-database-indexes

Random keys hit random positions in the tree. Over sustained writes, the set of pages touched approaches the full tree. The InnoDB buffer pool can't hold the entire tree once the table exceeds RAM, so every insert may miss the cache and fault a page from disk. This collapses insert throughput onto disk-seek latency.

2. Unpredictable destination leaf → write amplification via splits

The destination leaf node for an insert is unpredictable.

Node splits happen anywhere in the tree rather than concentrated at the right edge. Each split writes two pages (or more, if the split cascades up through internal nodes). With random inserts, splits are distributed across the tree and write amplification multiplies.

3. Out-of-order leaf values → range scans fan out

The values in the leaves are not in order.

A time-range query — the most common shape in operational workloads ("consider viewing the timeline on X, or a chat history in Slack") — with a random PK fans out across non-adjacent leaves even though the user query looks sequential. The B+tree leaf linked list doesn't help: the matching rows aren't neighbours.

Secondary amplification: larger keys shrink fan-out

A UUID is 16 bytes; a BIGINT is 8 bytes. With a fixed node size:

If our tree node is only 100 bytes, child pointers are 8 bytes, and values are 8 bytes. We could fit 4 UUIDs (plus 4 child pointers) in each node. … If we had used a BIGINT instead, we could fit 6 keys (and corresponding child pointers) in each node instead. This would lead to a shallower tree, better for performance.

Halving key size increases fan-out ~50%, which decreases tree depth by one for every factor of K in row count. For the same number of rows, a UUID-keyed tree is taller than a BIGINT-keyed tree — every lookup costs an extra page read.

Additionally, every secondary index leaf stores the primary key. Five secondary indexes × 8 extra bytes × N rows adds meaningful storage overhead on top of the main table.

UUIDv7 sidesteps most of it

UUIDv7 is time-ordered — the first 48 bits are a millisecond timestamp, followed by random bits for uniqueness within a millisecond. Inserts land in ascending order like a sequential integer; leaves grow on the right side of the tree; range scans on creation time are sequential. The article names this as the UUID escape hatch but defers details to a separate PlanetScale post (the-problem-with-using-a-uuid-primary-key-in-mysql).

UUIDv1 is also time-ordered in theory, but its field layout puts the low-order timestamp bits before the high-order ones, so the natural byte order isn't sort order — it needs a COMB-UUID / reordered-UUID transform to recover sequentiality.

Legitimate reasons to pick UUID anyway

The antipattern is real but not universal. UUIDs have properties that can outweigh the B+tree penalty in specific cases:

  • Client-side generation — no round-trip to the database to get a new ID; safe to create aggregates offline or on different nodes without coordination.
  • Global uniqueness across shards — a sequential BIGINT collides across shards unless coordinated (e.g. Snowflake IDs carry shard ID in the high bits). UUIDs don't.
  • Security by obscurity for IDs in URLs/user/12345 leaks your user count; /user/550e8400-e29b-... doesn't.
  • Merge-friendly — no ID collisions when merging data from two separate trees.

For those cases, UUIDv7 is the modern compromise: global uniqueness + client-side generation + sequential-like B+tree locality.

Mitigations when stuck with random UUID

  • Don't use UUID as primary key; use it as a unique secondary index. Use an internal BIGINT AUTO_INCREMENT as the clustered PK, expose the UUID externally through a unique index. Keeps the table's physical layout sequential.
  • Store UUID as BINARY(16) not CHAR(36). Character encoding balloons the 128-bit value to 36 bytes — >2× storage, >2× B+tree penalty.
  • Batch inserts by UUID prefix if you must use UUID as PK — group by high bits so each batch hits one subtree, at least amortising the cache churn.

Seen in

Last updated · 319 distilled / 1,201 read