CONCEPT Cited by 1 source
Dynamic sharding¶
Dynamic sharding treats a service's shard assignment as state owned by a controller, continuously and asynchronously updated in response to pod health, load, termination notices, and other environmental signals. Contrast concepts/static-sharding, where the assignment is a pure function of the key fixed at code time.
Primitive operations (from auto-sharders)¶
The controller mutates the assignment via minimal adjustments:
- Split — a slice (range of keys) gets too large or too hot; divide it.
- Merge — two cool adjacent slices become one.
- Replicate — a slice (often a hot-key slice) is assigned to more than one pod to absorb load; see patterns/shard-replication-for-hot-keys.
- Dereplicate — shrink the replica set when load eases.
- Move — reassign a slice to a different pod (for load, health, or placement).
Minimal adjustments over full reshuffles is a key design choice — avoids the "stop the world" cost of classic rebalancing.
Why it wins over static sharding¶
- Availability across restarts/autoscale — controller reacts to termination notices before pods leave; no drop window.
- concepts/hot-key handling — hot key isolated into its own slice, slice replicated across N pods.
- No concepts/split-brain — central controller authoritative; clients converge on its assignment.
Why it wins over stateless + remote cache¶
- Local method calls instead of RPC to a cache → no network tax, no (de)serialization CPU, no concepts/tail-latency-at-scale hop.
- No "overread" waste — pod works off in-memory structures tuned to the workload, rather than fetching whole blobs from a cache.
Consistency trade-off¶
Authoritatively publishing an assignment doesn't itself imply strong key ownership on pods. Two postures appear in production auto-sharders:
- Lease-based (strong ownership): systems/slicer, systems/centrifuge. Pods hold leases; exclusive ownership is a correctness property.
- Eventually-consistent assignments: systems/dicer. Prioritizes availability and fast recovery; Clerks/Slicelets may briefly disagree during transitions; applications layer their own correctness if needed.
Canonical implementations¶
- systems/slicer (Google, OSDI 2016) — range-based, lease-based.
- systems/centrifuge (Microsoft, NSDI 2010) — integrated lease-management and partitioning.
- systems/shard-manager (Meta, SOSP 2021) — generic, geo-distributed.
- systems/dicer (Databricks, 2026-01 open-sourced) — eventually-consistent assignments.
Seen in¶
- sources/2026-01-13-databricks-open-sourcing-dicer-auto-sharder — Dicer as a dynamic sharding primitive; three Databricks case studies (Unity Catalog, SQL Query Orchestration, Softstore).