PLANETSCALE 2024-07-08

PlanetScale — Sharding strategies: directory-based, range-based, and hash-based¶

Summary¶

Holly Guevara's 2024-07-08 PlanetScale post is a short-form sibling to Ben Dicken's later, longer Database sharding primer (2025-01-09) — same three-strategy taxonomy (directory / lookup → lookup sharding; range → range sharding; hash → hash sharding) at a shallower altitude, published six months earlier. The post's load-bearing framing:

"At PlanetScale, we typically steer customers toward hash-based sharding as the default. It generally gives you the most even distribution and isn't overly complicated." (Source: the raw post)

That single sentence is the canonical wiki-ingestable PlanetScale-voice recommendation of hash sharding as the production default — a framing Dicken's later post implies but never states as explicitly. It complements Dicken's more mechanism-heavy treatment with a customer-consultation-altitude default-choice disclosure.

The post's other durable contribution is a members + court worked example (the row-per-person + tenant-court dataset) run through all three strategies with a final comparison diagram. It canonicalises:

Lookup / directory sharding via the court → shard mapping table — "if one court has significantly more members than another, you risk having one shard that's much larger than the others. Additionally, if one court has one or several members that are much more active in whatever you're doing, you can create a hotspot where that shard is being accessed much more frequently than the others." Canonical wiki statement of concepts/lookup-sharding's two canonical failure modes: data-size skew + access-pattern hotspot, both surfaced by the same operator-chosen mapping key.
Lookup-table caching as the standard mitigation for the extra-hop cost — "In practice, you'd want to heavily cache the lookup table to make it faster, but you'll still have that extra step regardless." Canonicalises the extra-hop cost as latency-bounded-but-unavoidable once the strategy is chosen.
Range sharding on a name A-G / H-N / O-U / V-Z partitioning — "Depending on how you're picking these ranges, you can easily overload shards. In this case, shard 1 has 5 records in it while shard 4 has 0." Same skew lesson as Dicken's alphabetical-names instance, shorter.
Range-sharding resharding as the mitigation — "if we started with this method for mapping the ranges, but then quickly noticed that shard 1 was growing much faster than the others, we may decide to break shard 1 up into multiple shards with a reshard operation. We can also make the range for the last shard larger since those letters happen less frequently." Canonical wiki-ingestable framing of range-sharding as an operator-tuned strategy — not abandoning the range approach on first skew, but widening ranges + splitting hot ranges as the corrective loop.
Hash sharding via md5 on id as the distribution-uniform default — "you're taking human guesswork out of the equation. Rather than relying on a human to guess at how much data will fall into each range, the hash will keep the data evenly distributed, so long as the shard key has high cardinality." Same high-cardinality precondition as Dicken.
Hash-sharding resharding as easier than range — "With this method, it's also a bit easier to reshard if you find that certain shards are taking more of a hit than others." A framing Dicken's later post extends into the consistent-hashing connection without naming it here.
Comparison-diagram closing — the post's final image is a members-table distribution comparison across all three strategies on the same dataset showing hash's near-uniform distribution vs range's 5/0/7/8-like skew vs lookup's tenant-chosen split. Mentioned but not reproducible in markdown.

No new concept or pattern pages are canonicalised by this ingest. All three strategy pages (concepts/lookup-sharding, concepts/range-sharding, concepts/hash-sharding) already exist at deeper altitude from Dicken's post; this post extends their ## Seen in sections with a second citation and canonicalises the "hash-based as PlanetScale default" customer-consultation framing.

Key takeaways¶

PlanetScale steers customers to hash sharding as the default. "At PlanetScale, we typically steer customers toward hash-based sharding as the default. It generally gives you the most even distribution and isn't overly complicated." Canonical wiki-ingestable vendor-voice default-choice disclosure. Dicken's later primer implies the same via the pedagogical flow but never states it as a consultation recommendation.
Lookup / directory sharding's two canonical failure modes surface on the same operator-chosen key. "if one court has significantly more members than another, you risk having one shard that's much larger than the others. Additionally, if one court has one or several members that are much more active in whatever you're doing, you can create a hotspot where that shard is being accessed much more frequently than the others." Data-size skew (storage) + access-pattern hotspot (traffic) — both produced by the same operator-chosen court → shard mapping. The operator-authored mapping doesn't shield you from the tenant-distribution skew it was supposed to control.
Lookup-table caching is the standard extra-hop mitigation but not a removal. "In practice, you'd want to heavily cache the lookup table to make it faster, but you'll still have that extra step regardless." Latency-bounded by caching, never zero. Canonicalises cache-hot-path as the assumed ecosystem posture around lookup sharding.
Range-sharding rebalancing is a live operator loop, not a one-time design choice. "if we started with this method for mapping the ranges, but then quickly noticed that shard 1 was growing much faster than the others, we may decide to break shard 1 up into multiple shards with a reshard operation. We can also make the range for the last shard larger since those letters happen less frequently." Complementary framing to Dicken's "range sharding only works when the value distribution is known and stable" statement — this post names the ongoing corrective loop (widen-cold-range + split-hot-range) that keeps range sharding viable against distribution drift.
Hash sharding removes human guesswork at the cost of the hash step. "you're taking human guesswork out of the equation. Rather than relying on a human to guess at how much data will fall into each range, the hash will keep the data evenly distributed, so long as the shard key has high cardinality." Canonical wiki-ingestable framing of hash sharding as "distribution-by-function rather than distribution-by-operator-judgement" — with the same high-cardinality precondition Dicken names.
Hash-sharding resharding is easier than range-sharding resharding. "With this method, it's also a bit easier to reshard if you find that certain shards are taking more of a hit than others." The post leaves the why implicit — it points at the consistent-hashing property (only 1/N keys re-map on shard add/remove) that Dicken names explicitly but Guevara doesn't.
The members + court worked example is the pedagogy substrate. Small dataset (~20 rows, 4 courts, 4 shards) run through all three strategies on the same schema. Final comparison diagram shows hash's near-uniform distribution vs range's 5/0/7/8-like skew vs lookup's tenant-assigned split. Smaller-dataset sibling to Dicken's retailer-toy schema.

Operational numbers¶

None. Pedagogical short-form post — ~1,500 words, no production figures, no latency measurements, no customer telemetry, no benchmark. The load-bearing operational disclosure is the vendor-consultation default ("we typically steer customers toward hash-based sharding"), which is directional not quantitative.

Caveats¶

Short-form pedagogy. ~1,500 words, zero production numbers, zero Vitess-mechanism detail, zero consistent-hashing framing despite hash sharding being the canonical default. Dicken's later post is the load-bearing canonical source on the same topic; this post is the consultation-default sibling.
No custom-sharding coverage. Guevara names three strategies; Dicken's later post names four (range / hash / lookup / custom). The "custom sharding function" escape hatch that Vitess supports is absent here — readers of this post would miss that category.
md5 as hash example is misleading at the production-routing altitude. md5 is a cryptographic hash (~100 MB/s/core), not the production default (non-cryptographic mixers like xxhash / CityHash / MurmurHash3 / Vitess's hash Vindex at ~1 GB/s/core — see concepts/hash-sharding#Cryptographic-vs-non-cryptographic-hash). Pedagogically fine; operationally a red herring if taken literally.
No Vitess internals. Post mentions "solutions like Vitess and PlanetScale" at the opening but doesn't walk the keyspace_id address space, the Vindex framework, or the Primary/Secondary-Vindex layering that production implementations rely on. Dicken's later post supplies all of this; Guevara's does not.
Vendor-consultation framing hand-waves trade-offs. "At PlanetScale, we typically steer customers toward hash-based sharding as the default" is canonical but lacks the counter-case (when lookup / range wins). The "secondary-lookup queries" and "pre-existing data placement" scenarios where lookup sharding is the right choice — see concepts/lookup-sharding#When-it's-the-right-choice — are absent.
Worked-example size is unrealistic. 4 shards × ~5 rows each is pedagogy scale. "Of course, this is a tiny example and not a realistic sharding scenario given the size" — the post acknowledges the scale gap explicitly but doesn't supply a realistic one.
First post of an intended series — "There's a lot that goes into this, and which we'll cover in a short series of posts." The sequel(s) are not referenced by later PlanetScale posts on the wiki by slug, so the series may be incomplete. The 2024-11-07 Introducing sharding on PlanetScale with Workflows post four months later is the operational-UX sibling.
Holly Guevara's wiki context. Guevara's 2022-10 co-authored Behind the Scenes: How Schema Reverts Work post is the canonical Schema-Reverts mechanism ingest; she's not a database-internals default-include voice on wiki (per companies/planetscale skip rules), but this short post's vendor-consultation framing + extending the existing canonical sharding-strategy pages justifies full ingest as a minimal-viable-page source.

Source¶

Original: https://planetscale.com/blog/types-of-sharding
Raw markdown: raw/planetscale/2026-04-21-sharding-strategies-directory-based-range-based-and-hash-bas-1e092c95.md