Skip to content

CONCEPT Cited by 1 source

Shard key

A shard key is the column (or composite) whose value selects which physical shard a row lives on under horizontal sharding. Every design constraint of a horizontally-sharded schema revolves around the shard key.

What queries need it for

  • Routing: most queries must include the shard key in their predicate so the router can send the query to a single shard. Queries without the shard key become scatter-gather.
  • Joins: cross-table joins typically work only when both tables are colocated on the same shard key and the join itself is on the shard key (patterns/colocation-sharding).
  • Constraints: foreign keys only work when the foreign key is the shard key. Globally unique indexes generally cannot be enforced across shards; many implementations only support unique indexes that include the shard key.

Choosing the shard key

Single universal key vs a small set

A single shard key that works for every table is ideal but rare. Figma's relational data model (file metadata, organization metadata, comments, file versions, …) had no single good candidate; creating a synthetic composite key would have required a schema change + expensive backfill + substantial product-layer refactor across every table (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).

Figma's alternative: a handful of keys — UserID, FileID, OrgID — each of which covers "almost every table." Tables sharing a shard key are grouped into a colo that shares a physical layout and supports cross-table joins + full transactions when scoped to a single shard-key value.

Even distribution — avoiding hotspots

Once a shard key is chosen, its values must distribute evenly across shards. Auto-incrementing IDs and Snowflake-style timestamp-prefixed IDs are common pitfalls: sequential IDs concentrate recent writes on one shard, producing a structural hotspot.

Three options to handle this:

  1. Migrate to randomized IDs (e.g. UUIDv4). Clean but requires a data migration.
  2. Hash the shard key (hash(shard_key) → shard_id with a sufficiently-random hash function). No schema change; uniformity is a property of the hash. Downside: range scans on the shard key become inefficient — sequential keys hash to different shards, so WHERE shard_key BETWEEN a AND b fans out. Acceptable when range scans on the shard key are rare (Figma's choice — Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).
  3. Accept partial skew + application-layer mitigation (cache, read replicas) for the one or two hot keys.

Trade-off axis

Property Sequential shard key Hashed shard key Randomized IDs
Range scans on shard key Efficient Inefficient (scatter-gather) Inefficient (no natural order)
Even distribution No (new keys cluster) Yes Yes
Schema migration None None Required
Backfill None None Required

Data model matters more than algorithm

The central lesson from the Figma post: the shard-key decision is downstream of the data model, not upstream. A data model with a natural partition axis (per-user, per-org, per-tenant) makes per-domain shard-key selection + colocation cheap. A data model without one forces either an expensive synthetic-key migration or accepting scatter-gather on a large fraction of queries.

Seen in

Last updated · 200 distilled / 1,178 read