CONCEPT Cited by 2 sources

Shard key¶

A shard key is the column (or composite) whose value selects which physical shard a row lives on under horizontal sharding. Every design constraint of a horizontally-sharded schema revolves around the shard key.

What queries need it for¶

Routing: most queries must include the shard key in their predicate so the router can send the query to a single shard. Queries without the shard key become scatter-gather.
Joins: cross-table joins typically work only when both tables are colocated on the same shard key and the join itself is on the shard key (patterns/colocation-sharding).
Constraints: foreign keys only work when the foreign key is the shard key. Globally unique indexes generally cannot be enforced across shards; many implementations only support unique indexes that include the shard key.

Choosing the shard key¶

Single universal key vs a small set¶

A single shard key that works for every table is ideal but rare. Figma's relational data model (file metadata, organization metadata, comments, file versions, …) had no single good candidate; creating a synthetic composite key would have required a schema change + expensive backfill + substantial product-layer refactor across every table (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).

Figma's alternative: a handful of keys — UserID, FileID, OrgID — each of which covers "almost every table." Tables sharing a shard key are grouped into a colo that shares a physical layout and supports cross-table joins + full transactions when scoped to a single shard-key value.

Even distribution — avoiding hotspots ¶

Once a shard key is chosen, its values must distribute evenly across shards. Auto-incrementing IDs and Snowflake-style timestamp-prefixed IDs are common pitfalls: sequential IDs concentrate recent writes on one shard, producing a structural hotspot.

Three options to handle this:

Migrate to randomized IDs (e.g. UUIDv4). Clean but requires a data migration.
Hash the shard key (hash(shard_key) → shard_id with a sufficiently-random hash function). No schema change; uniformity is a property of the hash. Downside: range scans on the shard key become inefficient — sequential keys hash to different shards, so WHERE shard_key BETWEEN a AND b fans out. Acceptable when range scans on the shard key are rare (Figma's choice — Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).
Accept partial skew + application-layer mitigation (cache, read replicas) for the one or two hot keys.

Trade-off axis¶

Property	Sequential shard key	Hashed shard key	Randomized IDs
Range scans on shard key	Efficient	Inefficient (scatter-gather)	Inefficient (no natural order)
Even distribution	No (new keys cluster)	Yes	Yes
Schema migration	None	None	Required
Backfill	None	None	Required

Data model matters more than algorithm¶

The central lesson from the Figma post: the shard-key decision is downstream of the data model, not upstream. A data model with a natural partition axis (per-user, per-org, per-tenant) makes per-domain shard-key selection + colocation cheap. A data model without one forces either an expensive synthetic-key migration or accepting scatter-gather on a large fraction of queries.

Seen in¶

sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale — Figma picks UserID, FileID, OrgID as the shard-key set; hash-of-shard-key routing avoids Snowflake-ID hotspots at the cost of shard-key range-scan efficiency.
sources/2026-04-21-planetscale-dealing-with-large-tables — Ben Dicken's canonical exercise_log teaching example illustrates the shard-key choice dominating everything else: hash(log_id) distributes writes evenly but scatters per-user reads; hash(user_id) collapses per-user reads to a single shard. Canonical wiki naming of query-pattern alignment as the load-bearing selection criterion (patterns/shard-key-aligned-with-query-pattern). Also surfaces the hot-shard write-frontier problem implicit in range-sharding any monotonic key.