CONCEPT Cited by 1 source
Geospatial sharding¶
Definition¶
Geospatial sharding is the practice of using geographic location as the shard key for partitioning state across a distributed system. Instead of hashing by user ID or random partition key, each piece of state lives on the node(s) responsible for the geographic cell containing it.
Implementations depend on a spatial indexing scheme that divides the Earth's surface into discrete, labelable cells:
- Google S2 — hierarchical square-ish cells via a Hilbert space-filling curve. 64-bit S2 Cell ID per cell.
- Uber H3 — hexagonal hierarchical cells. Hexagons have uniform neighbor distances (no corner vs. edge asymmetry like squares), better for radius-based queries.
- Geohash — Z-order-curve-based string encoding.
- R-tree / quadtree — dynamic tree structures for non-uniformly-distributed data.
Why it's the right shape for ridesharing dispatch¶
A dispatch service (Uber, Instacart, DoorDash) has to answer "given this rider's location, who are the nearby-available drivers?" — a spatial range query. If driver supply state is sharded by geography:
- The query routes to a small number of nodes (the cells covering the rider's neighborhood).
- Those nodes can compute TSP-variant matching over purely-local state without cross-shard coordination.
- Failure/partition is contained to a cell's worth of state.
This is why Uber's 2014 dispatch rewrite chose S2 Cell IDs as the shard key for its geospatial index — see sources/2024-03-14-highscalability-brief-history-of-scaling-uber:
"We used Google's S2 library to segment cities into areas of consideration and used the S2 cell ID as the sharding key. (We've since updated to and open-sourced H3)."
The state-distribution problem¶
Geospatial sharding requires that each dispatch node know which cell(s) it owns and how cells are currently allocated across the cluster — a classic cluster membership + routing problem.
Uber's answer: Ringpop — a gossip-protocol library that lets each stateful Node.js dispatch node maintain a consistent view of which peer owns which S2 cell without a central coordinator. This is the canonical geospatial-gossip-sharding pattern.
Related¶
- systems/s2-geometry — Google's spatial library.
- systems/h3-geo — Uber's hexagonal successor.
- systems/ringpop — the gossip library that makes stateful geospatial-sharded Node.js viable.
- systems/uber-dispatch — the canonical deployment on this wiki.
- concepts/horizontal-sharding — parent concept of sharding in general.
- concepts/hash-ring — the non-spatial peer (consistent hashing); geospatial sharding is the specialization of sharding to spatial data.
- concepts/traveling-salesman-problem — the matching-optimization problem shape that motivates spatial locality.
- patterns/geospatial-gossip-sharding — the full stack pattern.
- companies/uber — the textbook production deployment.