CONCEPT Cited by 1 source
Shard key¶
A shard key is the column (or composite) whose value selects which physical shard a row lives on under horizontal sharding. Every design constraint of a horizontally-sharded schema revolves around the shard key.
What queries need it for¶
- Routing: most queries must include the shard key in their predicate so the router can send the query to a single shard. Queries without the shard key become scatter-gather.
- Joins: cross-table joins typically work only when both tables are colocated on the same shard key and the join itself is on the shard key (patterns/colocation-sharding).
- Constraints: foreign keys only work when the foreign key is the shard key. Globally unique indexes generally cannot be enforced across shards; many implementations only support unique indexes that include the shard key.
Choosing the shard key¶
Single universal key vs a small set¶
A single shard key that works for every table is ideal but rare. Figma's relational data model (file metadata, organization metadata, comments, file versions, …) had no single good candidate; creating a synthetic composite key would have required a schema change + expensive backfill + substantial product-layer refactor across every table (Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).
Figma's alternative: a handful of keys — UserID, FileID, OrgID — each of which covers "almost every table." Tables sharing a shard key are grouped into a colo that shares a physical layout and supports cross-table joins + full transactions when scoped to a single shard-key value.
Even distribution — avoiding hotspots¶
Once a shard key is chosen, its values must distribute evenly across shards. Auto-incrementing IDs and Snowflake-style timestamp-prefixed IDs are common pitfalls: sequential IDs concentrate recent writes on one shard, producing a structural hotspot.
Three options to handle this:
- Migrate to randomized IDs (e.g. UUIDv4). Clean but requires a data migration.
- Hash the shard key (
hash(shard_key) → shard_idwith a sufficiently-random hash function). No schema change; uniformity is a property of the hash. Downside: range scans on the shard key become inefficient — sequential keys hash to different shards, soWHERE shard_key BETWEEN a AND bfans out. Acceptable when range scans on the shard key are rare (Figma's choice — Source: sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale). - Accept partial skew + application-layer mitigation (cache, read replicas) for the one or two hot keys.
Trade-off axis¶
| Property | Sequential shard key | Hashed shard key | Randomized IDs |
|---|---|---|---|
| Range scans on shard key | Efficient | Inefficient (scatter-gather) | Inefficient (no natural order) |
| Even distribution | No (new keys cluster) | Yes | Yes |
| Schema migration | None | None | Required |
| Backfill | None | None | Required |
Data model matters more than algorithm¶
The central lesson from the Figma post: the shard-key decision is downstream of the data model, not upstream. A data model with a natural partition axis (per-user, per-org, per-tenant) makes per-domain shard-key selection + colocation cheap. A data model without one forces either an expensive synthetic-key migration or accepting scatter-gather on a large fraction of queries.
Seen in¶
- sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale — Figma picks
UserID,FileID,OrgIDas the shard-key set; hash-of-shard-key routing avoids Snowflake-ID hotspots at the cost of shard-key range-scan efficiency.