Skip to content

CONCEPT Cited by 1 source

Range sharding

Range sharding routes each row to a shard by whether its shard-key value falls inside a pre-defined range assigned to that shard. The router holds a small table of (shard_id, lower_bound, upper_bound) tuples and dispatches queries accordingly. One of the four production sharding strategies enumerated by Ben Dicken (Source: sources/2026-04-21-planetscale-database-sharding), alongside hash, lookup, and custom.

When it works

Range sharding works when the value distribution across the shard-key column is known and stable — the ranges can be drawn so each shard receives a roughly equal share of both data and traffic, and the distribution doesn't drift such that the ranges become unbalanced over time.

  • Geographic sharding (country_code) when tenant sizes are known.
  • Tenant sharding with large tenants given dedicated ranges.
  • Time-range sharding on historical-only tables where current writes land in one shard by design (analytics-read + archival-write).

When it fails — three canonical hotspots

Dicken's primer walks three obvious range-sharding choices on a retailer toy schema, each producing uneven load (Source: sources/2026-04-21-planetscale-database-sharding):

  1. Monotonically increasing IDs"The first 25 inserts all go to the first shard, leading to one hot shard … and three other cool shards. If we continue inserting, the same problem arises for all the other shards." The active write-frontier pins to a single shard; the rest are cold. Canonical wiki concepts/hot-key instance; Figma names the same phenomenon on Snowflake-style timestamp-prefixed IDs (sources/2026-04-21-figma-how-figmas-databases-team-lived-to-tell-the-scale).

  2. Alphabetical name ranges"None of our users have names in the v-z range, leading to a wasted shard. Such a sharding solution only works well if our users have names that are perfectly evenly distributed across the alphabet. This is rarely true in practice." Real-world string distributions are non-uniform (Zipfian over first letter); ranges derived from a uniform prior are always skewed.

  3. Age ranges"The vast majority of our users are between 25-74 years of age. Two of our shards are hot with lots of traffic while the other two are quite cold." And the distribution drifts: today's working-age shard is tomorrow's retiree shard. Range sharding on a column whose distribution shifts over time is an ongoing rebalancing burden.

Trade-off vs hash sharding

Property Range sharding Hash sharding
Range scans on shard key Efficient — sequential keys on one shard Inefficient — sequential keys scatter
Even distribution without prior knowledge No — requires knowing the distribution Yes — property of the hash
Handles monotonic IDs Bad — active frontier hotspot Good — hash smears across shards
Handles skewed value distributions Bad — ranges match the skew Good — hash flattens skew
Rebalancing when distribution drifts Required, ongoing Not required for the routing itself

Range sharding wins on range-scan queries; hash wins on distribution robustness. The choice depends on whether shard-key range scans are dominant (favour range) or rare (favour hash).

Vitess context

Vitess supports range sharding via its Vindex framework — each shard owns a range of keyspace_id values, and a range-based Primary Vindex produces contiguous keyspace_id outputs. Most production Vitess deployments use a hash Primary Vindex (the shard-key values are hashed before the keyspace_id is computed), giving hash-sharding semantics on top of Vitess's range-addressed shard fabric. Range sharding at the logical level (on unhashed shard-key values) is an explicit configuration choice, not the default.

Seen in

Last updated · 347 distilled / 1,201 read