Skip to content

CONCEPT Cited by 1 source

Partition strategy

Partition strategy is the umbrella term for "how a sharded database decides which rows are stored together and on which server". Justin Gage's canonical naming (Source: sources/2026-04-21-planetscale-what-is-database-sharding-and-how-does-it-work):

"How you decide to split up your data into shards – also referred to as your partition strategy – should be a direct function of how your business runs, and where your query load is concentrated."

The phrase sits above the three/four concrete strategies on the wiki (hash, range, lookup / directory-based, plus custom) as the choice-level noun — the decision the team makes before wiring up the mechanics.

Load-bearing property: "direct function of how your business runs"

The strategy isn't a generic architectural preference — it's derived from the dominant query-load shape:

  • B2B SaaS where every user belongs to an organization → partition by org_id (related tables colocate; per-org queries go to one shard).
  • Consumer with no meaningful clustering → hash on a unique ID (Gage's worked example: Amazon's order_id).
  • Single-tenant verticals with large known tenants → directory / lookup-based with operator-authored placement.

Gage's Notion worked example: "Notion manually sharded their Postgres database by simply splitting on team ID." The team-ID shard key is a direct projection of Notion's data model — every row belongs to a team; most queries are team-scoped; collocation is automatic.

Three canonical algorithms (Gage's list)

Gage names three partition-strategy algorithms (vs Dicken's four — Dicken adds a "custom" escape hatch):

Algorithm Row→shard mapping Trade-off
Hash-based (key-based) bucket(hash(col)) Even distribution; range-scan scatter-gather
Range-based Pre-assigned value ranges Range-scan efficient; hotspot risk
Directory-based (= lookup) Operator-authored mapping table Arbitrary placement; extra hop + consistency burden

"Directory-based" is Gage's name for what Dicken (sources/2026-04-21-planetscale-database-sharding) and Guevara (sources/2026-04-21-planetscale-sharding-strategies-directory-based-range-based-and-hash-based) call lookup sharding. Guevara's post uses Gage's directory-based terminology verbatim; the two names are synonyms on this wiki.

Query-load distribution is the selection input

"If your sharding scheme isn't random (e.g. hash based), you can begin to see why query profiling and understanding how your load is distributed can be useful." (Source: sources/2026-04-21-planetscale-what-is-database-sharding-and-how-does-it-work)

Hash sharding's strength is that it doesn't require knowing the load distribution in advance — the hash smears both data and load across shards regardless of the shard-key value distribution. Range and directory strategies, conversely, are only as good as the operator's prior on how load is actually distributed. When your prior is wrong (or drifts), range/directory strategies concentrate load on whichever shard ended up owning the popular keys — exactly the Figma snowflake-ID hotspot and Dicken's age-range hotspot.

"As simple or as complicated as you make it"

Gage's closing framing on strategy choice:

"All of this is to say that sharding can be as simple or as complicated as you make it."

Notion's team-ID split is one end of the spectrum (one column, no lookup table, no hash); Uber Schemaless / TAO-style directory-per-object-type sit at the other (operator-authored placement at object-class granularity). The choice of strategy sets the ceiling on how much operator work the cluster demands over time.

Seen in

Last updated · 550 distilled / 1,221 read