Skip to content

CONCEPT Cited by 1 source

Lookup sharding

Lookup sharding routes each row to a shard via a developer-maintained mapping table that the proxy consults on each query — rather than deriving the shard from hash(shard_key) (hash sharding) or from a fixed range table (range sharding). One of the four production sharding strategies enumerated by Ben Dicken (Source: sources/2026-04-21-planetscale-database-sharding):

"In lookup sharding, the developer can set up a table that contains information needed to map incoming data to the appropriate shard. This table is referenced when queries for such tables come in."

When it's the right choice

Lookup sharding is the strategy for when hash and range both fail to express the routing you need:

  • Pre-existing data placement — a large row / tenant is already on a specific shard for isolation reasons (regulatory, size, noisy-neighbor isolation) and must stay there regardless of its hashed key.
  • Manual hot-tenant isolation — one customer's data gets its own shard; everyone else hash-routes normally. Lookup makes the exception explicit.
  • Secondary-lookup queries — the table's rows are sharded by user_id but a common query filters by email. A lookup table maps email → keyspace_id so the proxy can route those queries without scatter-gather. The canonical Vitess instance is the Consistent Lookup Vindex — a sharded database's own tables materialise email → keyspace_id so queries on email route precisely.
  • Geographic placement — a country_code → region → shard mapping enforces data-residency regulations that hash(country_code) can't express.

Cost vs hash / range

  • Extra hop per query. The proxy reads the lookup table before dispatching; if the lookup table is another sharded database, this is one more network hop plus one more shard touched per query. Vitess amortises this with caching + batching in VTGate; the cost is real but small in well-tuned deployments.
  • Lookup-table consistency burden. The mapping table must stay consistent with the underlying rows. Consistent lookup Vindexes solve this by making lookup-table updates participate in the same transaction as the main-table write — so the lookup entry is never stale.
  • Orphan lookup rows — a canonical failure mode where a row is deleted from the main table but its lookup-table entry survives. Vitess's consistent-lookup mechanism addresses this with ordered commit.

Pairs with hash / range, doesn't replace them

Lookup sharding is almost always layered on top of a hash or range primary sharding:

  • Primary sharding (usually hash) determines where rows live by their primary key.
  • Lookup table maps secondary attributes (email, username) to the primary shard address.
  • Queries on the primary key use the primary router; queries on the lookup attribute consult the lookup table first, then the primary router.

This is exactly the Vitess Vindex architecture: a table has one Primary Vindex (hash by default) that determines physical placement, plus zero or more Secondary Vindexes (lookup / consistent-lookup / custom) that translate alternative query predicates into the same keyspace_id space.

Strictly more flexible, strictly more operational overhead

  • Pro: arbitrary routing logic (any key → any shard), mutable at runtime without rehashing, per-row overrides possible.
  • Con: operator responsibility to keep the lookup consistent; extra hop on reads; complexity of transactional lookup-table writes; ecosystem tooling assumes hash/range by default.

Hash sharding is the right first choice when the primary access pattern is WHERE primary_key = .... Lookup sharding is added when a secondary access pattern becomes hot enough that scattering it is unacceptable, and when that secondary attribute can't just be made the primary shard key (e.g. because multiple secondary patterns are simultaneously hot).

Seen in

Last updated · 347 distilled / 1,201 read