Skip to content

CONCEPT Cited by 1 source

Geospatial sharding

Definition

Geospatial sharding is the practice of using geographic location as the shard key for partitioning state across a distributed system. Instead of hashing by user ID or random partition key, each piece of state lives on the node(s) responsible for the geographic cell containing it.

Implementations depend on a spatial indexing scheme that divides the Earth's surface into discrete, labelable cells:

  • Google S2 — hierarchical square-ish cells via a Hilbert space-filling curve. 64-bit S2 Cell ID per cell.
  • Uber H3 — hexagonal hierarchical cells. Hexagons have uniform neighbor distances (no corner vs. edge asymmetry like squares), better for radius-based queries.
  • Geohash — Z-order-curve-based string encoding.
  • R-tree / quadtree — dynamic tree structures for non-uniformly-distributed data.

Why it's the right shape for ridesharing dispatch

A dispatch service (Uber, Instacart, DoorDash) has to answer "given this rider's location, who are the nearby-available drivers?" — a spatial range query. If driver supply state is sharded by geography:

  • The query routes to a small number of nodes (the cells covering the rider's neighborhood).
  • Those nodes can compute TSP-variant matching over purely-local state without cross-shard coordination.
  • Failure/partition is contained to a cell's worth of state.

This is why Uber's 2014 dispatch rewrite chose S2 Cell IDs as the shard key for its geospatial index — see sources/2024-03-14-highscalability-brief-history-of-scaling-uber:

"We used Google's S2 library to segment cities into areas of consideration and used the S2 cell ID as the sharding key. (We've since updated to and open-sourced H3)."

The state-distribution problem

Geospatial sharding requires that each dispatch node know which cell(s) it owns and how cells are currently allocated across the cluster — a classic cluster membership + routing problem.

Uber's answer: Ringpop — a gossip-protocol library that lets each stateful Node.js dispatch node maintain a consistent view of which peer owns which S2 cell without a central coordinator. This is the canonical geospatial-gossip-sharding pattern.

Last updated · 517 distilled / 1,221 read