CONCEPT Cited by 1 source
Lookup sharding¶
Lookup sharding routes each row to a shard via a developer-maintained mapping table that the proxy consults on each query — rather than deriving the shard from hash(shard_key) (hash sharding) or from a fixed range table (range sharding). One of the four production sharding strategies enumerated by Ben Dicken (Source: sources/2026-04-21-planetscale-database-sharding):
"In lookup sharding, the developer can set up a table that contains information needed to map incoming data to the appropriate shard. This table is referenced when queries for such tables come in."
When it's the right choice¶
Lookup sharding is the strategy for when hash and range both fail to express the routing you need:
- Pre-existing data placement — a large row / tenant is already on a specific shard for isolation reasons (regulatory, size, noisy-neighbor isolation) and must stay there regardless of its hashed key.
- Manual hot-tenant isolation — one customer's data gets its own shard; everyone else hash-routes normally. Lookup makes the exception explicit.
- Secondary-lookup queries — the table's rows are sharded by
user_idbut a common query filters byemail. A lookup table mapsemail → keyspace_idso the proxy can route those queries without scatter-gather. The canonical Vitess instance is the Consistent Lookup Vindex — a sharded database's own tables materialiseemail → keyspace_idso queries onemailroute precisely. - Geographic placement — a
country_code → region → shardmapping enforces data-residency regulations thathash(country_code)can't express.
Cost vs hash / range¶
- Extra hop per query. The proxy reads the lookup table before dispatching; if the lookup table is another sharded database, this is one more network hop plus one more shard touched per query. Vitess amortises this with caching + batching in VTGate; the cost is real but small in well-tuned deployments.
- Lookup-table consistency burden. The mapping table must stay consistent with the underlying rows. Consistent lookup Vindexes solve this by making lookup-table updates participate in the same transaction as the main-table write — so the lookup entry is never stale.
- Orphan lookup rows — a canonical failure mode where a row is deleted from the main table but its lookup-table entry survives. Vitess's consistent-lookup mechanism addresses this with ordered commit.
Pairs with hash / range, doesn't replace them¶
Lookup sharding is almost always layered on top of a hash or range primary sharding:
- Primary sharding (usually hash) determines where rows live by their primary key.
- Lookup table maps secondary attributes (
email,username) to the primary shard address. - Queries on the primary key use the primary router; queries on the lookup attribute consult the lookup table first, then the primary router.
This is exactly the Vitess Vindex architecture: a table has one Primary Vindex (hash by default) that determines physical placement, plus zero or more Secondary Vindexes (lookup / consistent-lookup / custom) that translate alternative query predicates into the same keyspace_id space.
Strictly more flexible, strictly more operational overhead¶
- Pro: arbitrary routing logic (any key → any shard), mutable at runtime without rehashing, per-row overrides possible.
- Con: operator responsibility to keep the lookup consistent; extra hop on reads; complexity of transactional lookup-table writes; ecosystem tooling assumes hash/range by default.
Hash sharding is the right first choice when the primary access pattern is WHERE primary_key = .... Lookup sharding is added when a secondary access pattern becomes hot enough that scattering it is unacceptable, and when that secondary attribute can't just be made the primary shard key (e.g. because multiple secondary patterns are simultaneously hot).
Seen in¶
- sources/2026-04-21-planetscale-database-sharding — Dicken's canonical one-paragraph naming as one of four strategies; "this table is referenced when queries for such tables come in."
- — Harshit Gangal's canonical internals-altitude disclosure of the Consistent Lookup Vindex as the Vitess implementation of lookup sharding with transactional consistency guarantees.
-
— Holly Guevara (PlanetScale, 2024-07-08) canonicalises the two failure modes surfaced by the same operator-chosen mapping key: "if one court has significantly more members than another, you risk having one shard that's much larger than the others. Additionally, if one court has one or several members that are much more active in whatever you're doing, you can create a hotspot where that shard is being accessed much more frequently than the others." Data-size skew + access-pattern hotspot — the operator-authored mapping doesn't shield the cluster from tenant-distribution skew it was supposed to control. Also canonicalises lookup-table caching as the standard extra-hop mitigation but not a removal: "In practice, you'd want to heavily cache the lookup table to make it faster, but you'll still have that extra step regardless."
-
— Justin Gage (guest post, 2023-04-06) names the strategy "directory-based sharding" verbatim — confirming the name is the cross-author industry vocabulary (Gage + Guevara both use "directory-based"; Dicken uses "lookup"; all three refer to the same strategy). Gage's definition: "Pick a column, allocate shards manually, and maintain a lookup table so you know where each row is stored." Canonical worked example is Notion's team-ID split: "Notion manually sharded their Postgres database by simply splitting on team ID" — a minimalist directory-based instance where the mapping-table is the app's
teamstable itself (team ID → shard is operator-authored per team). Distinguishes directory-based from hash/range on the basis that directory-based is not random — "If your sharding scheme isn't random (e.g. hash based), you can begin to see why query profiling and understanding how your load is distributed can be useful" — making load-distribution prior knowledge load-bearing on the directory-based choice.