Skip to content

PATTERN Cited by 1 source

Sequential primary key

Problem

A clustered-index database (MySQL InnoDB, InnoDB-alike) stores the whole table in a B+tree keyed on the primary key. The primary-key choice therefore determines the physical on-disk layout of every row. Random primary keys (UUIDv4, v3, v5 — see concepts/uuid-primary-key-antipattern) cause:

  • Unpredictable insert paths (cache miss per insert).
  • Write amplification via node splits across the tree.
  • Range scans fanning out across non-adjacent leaves.

Solution

Use a monotonically-increasing primary key. Typical candidates:

  • BIGINT UNSIGNED AUTO_INCREMENT — MySQL's native ever-increasing integer.
  • An application-level monotonic ID generator (Snowflake, ULID, Twitter snowflake, Instagram ID, etc.).
  • UUIDv7 — time-ordered UUID where the first 48 bits are a millisecond timestamp, giving sequential insert locality while retaining UUID-like uniqueness properties.
  • Timestamp-prefixed composite keys like (created_at_sec, user_id, random_bits).

The key property: successive inserts produce monotonically increasing key values, so every insert lands on the right-most path of the B+tree.

Why it works

With a sequential PK, the article states three properties:

We always follow the right-most path when inserting new values. Leaves only get added on the right side of the tree. At the leaf level, data is in sorted order based on when it was inserted.

sources/2024-09-09-planetscale-b-trees-and-database-indexes

Mechanical consequences:

  1. Hot working set is small and stable. Only the right-most path pages need to be cache-resident — roughly one page per tree level. The InnoDB buffer pool hit rate for inserts approaches 100% regardless of total table size.
  2. Node splits concentrate at the right edge. Splits happen almost exclusively when the right-most leaf fills. Neighbouring internal-node splits are predictable and localised.
  3. Range scans on the PK are sequential I/O. The B+tree leaf linked list traverses neighbouring leaves. A WHERE created > $START AND created < $END query on a sequential PK reads contiguous pages — same shape as a sequential scan.
  4. Time-ordered reads match natural query shape. Most operational workloads (timelines, chat history, event streams, audit logs) are naturally time-sequenced; a sequential PK makes the physical layout match the read pattern by construction.

Worked comparison

Same insert workload, two schemas. From the article's bar-chart visualisation of "unique nodes visited for the previous 5 inserts":

Metric Sequential PK Random PK (UUIDv4)
Unique nodes visited per 5 inserts Low, bounded High, near-tree-depth × insert-count
Buffer-pool hit rate under memory pressure Near 100% Collapses as table exceeds RAM
Leaves modified per 5 inserts ~1 Up to 5
Range scan on PK Sequential Random fan-out

Caveats

  • Hot-spot at the right edge under high concurrency. All inserts target the same right-most leaf + ancestors. On write-heavy workloads, lock contention on that path can become the bottleneck — InnoDB's row-level locking + latch-free B+tree operations mitigate this in the common case but not at extreme write rates.
  • Sharding reverses the argument. When a sequential key is used as the shard key on a sharded system, all new writes go to one shard — see concepts/shard-key. Fix: hash the shard key so routing is uniform while still using a sequential PK within each shard.
  • Doesn't help multi-region writes. A single monotonic counter across regions requires coordination; region-local monotonic + region-prefix composite keys avoid this but cost some of the sequential-locality benefit.
  • AUTO_INCREMENT gaps exist. Failed transactions + restarts leave holes in the sequence — not a correctness issue but a surprise for users who expect dense IDs.
  • Still recommended to prefer smaller keys. BIGINT (8 bytes) beats UUIDv7 (16 bytes) on fan-out and secondary-index overhead; use UUIDv7 only if the global-uniqueness or client-side-generation properties justify the extra bytes.
  • Range scan anti-benefit on email / alphabetical columns. A secondary index on email behaves like a random-PK insert — users don't sign up in alphabetical order. The sequential-PK pattern is about primary key choice; secondary indexes on naturally-unsorted columns pay some of the same cost regardless.

When to break the rule

  • Global uniqueness across uncoordinated writers — UUID (including v7) or sharded monotonic IDs.
  • Hide row count in URLs — opaque IDs prevent enumeration attacks.
  • Merge-friendly datasets — two copies of the same table can be combined without ID conflicts.

For these, prefer UUIDv7 over UUIDv4 to keep B+tree locality.

Seen in

  • sources/2024-09-09-planetscale-b-trees-and-database-indexes — canonical articulation of the pattern with the three consequences of sequential-PK inserts + worked visual comparison vs random PK. Extends to non-PK indexes (a created_at timestamp column behaves the same way on a secondary index).
Last updated · 319 distilled / 1,201 read