CONCEPT Cited by 2 sources

Shard key cardinality¶

Shard key cardinality is the number of distinct values a candidate shard-key column takes across the dataset. A high-cardinality column (e.g. user_id — unique per row) distributes evenly across shards under hash sharding; a low-cardinality column (e.g. country_code, subscription_tier) concentrates rows onto a small number of shards regardless of the hash. The first of the three shard-key selection criteria named by Ben Dicken (Source: sources/2026-04-21-planetscale-database-sharding).

Why it matters¶

A hash function distributes its outputs uniformly, but only if the inputs are diverse enough to fill the output space. With only 5 distinct subscription_tier values, hash(subscription_tier) produces only 5 hash outputs — every row with that tier routes to whichever shard owns that hash value. No amount of scaling the shard count helps: the routing cardinality is capped at 5.

With user_id as the shard key, every row has a distinct input; the hash uniformly populates the output range; rows spread evenly across however many shards exist.

Dicken's framing:

"Ideally, we want something with high cardinality. The name column is not the ideal choice. There may be very popular names, and even with hashing we might end up with hotter servers than others. Often a column like user_id is a good choice because each value is unique." (Source: sources/2026-04-21-planetscale-database-sharding)

Cardinality threshold¶

A useful heuristic: the shard-key column should have substantially more distinct values than the target shard count. If you plan to run 128 shards, a column with 10,000+ distinct values gives the hash room to distribute evenly. A column with 50 distinct values can't — even with a perfect hash, each shard will hold 0–2 of the 50 values, and the resulting load will be bumpy.

For stable load, operators target 10× to 100× more shard-key values than shards, with ideally uniform distribution across those values.

Two cardinality failure modes¶

Low-cardinality keys — the obvious failure (bucket count < shard count, load concentrates).
Skewed high-cardinality keys — cardinality is high but one or two values dominate traffic. name has thousands of distinct values but "Joseph" and "Mary" are common; the shard hosting common-name hashes gets disproportionate load. Canonical wiki concepts/hot-key instance — cardinality alone is insufficient; value-level distribution uniformity also matters.

Why `user_id` is the canonical high-cardinality shard key¶

One row per user, so cardinality equals row count (near-maximum).
Created with explicit uniqueness guarantee, so skew at any single user_id is zero by construction (one row per value).
Fixed-width integer, so hashing is cheap (~ns-scale on modern CPUs) relative to variable-length string columns.
Covers almost every table under a user-centric data model: orders, events, preferences, sessions all naturally have a user_id column.

Per-tenant apps often choose org_id for the same reasons; per-file apps often choose file_id (Figma's actual production choice is UserID + FileID + OrgID as a three-key set — Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).

Pairs with the other two shard-key criteria¶

Cardinality is the first of three criteria; the other two are volatility (the shard-key column should be immutable — mutating it forces row migration across shards) and query-pattern alignment (the shard key should appear in the predicate of the dominant query so that query routes to a single shard). A well-chosen shard key optimises for all three.

Seen in¶

sources/2026-04-21-planetscale-database-sharding — canonical naming of cardinality as the first shard-key selection criterion; user_id > name recommendation with the two reasons (uniqueness + hash-speed on fixed-width integers).
sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale — Figma's three-key set (UserID, FileID, OrgID) all selected on cardinality grounds; each covers "almost every table."