Skip to content

CONCEPT Cited by 1 source

Dynamic sharding

Dynamic sharding treats a service's shard assignment as state owned by a controller, continuously and asynchronously updated in response to pod health, load, termination notices, and other environmental signals. Contrast concepts/static-sharding, where the assignment is a pure function of the key fixed at code time.

Primitive operations (from auto-sharders)

The controller mutates the assignment via minimal adjustments:

  • Split — a slice (range of keys) gets too large or too hot; divide it.
  • Merge — two cool adjacent slices become one.
  • Replicate — a slice (often a hot-key slice) is assigned to more than one pod to absorb load; see patterns/shard-replication-for-hot-keys.
  • Dereplicate — shrink the replica set when load eases.
  • Move — reassign a slice to a different pod (for load, health, or placement).

Minimal adjustments over full reshuffles is a key design choice — avoids the "stop the world" cost of classic rebalancing.

Why it wins over static sharding

  • Availability across restarts/autoscale — controller reacts to termination notices before pods leave; no drop window.
  • concepts/hot-key handling — hot key isolated into its own slice, slice replicated across N pods.
  • No concepts/split-brain — central controller authoritative; clients converge on its assignment.

Why it wins over stateless + remote cache

  • Local method calls instead of RPC to a cache → no network tax, no (de)serialization CPU, no concepts/tail-latency-at-scale hop.
  • No "overread" waste — pod works off in-memory structures tuned to the workload, rather than fetching whole blobs from a cache.

Consistency trade-off

Authoritatively publishing an assignment doesn't itself imply strong key ownership on pods. Two postures appear in production auto-sharders:

  • Lease-based (strong ownership): systems/slicer, systems/centrifuge. Pods hold leases; exclusive ownership is a correctness property.
  • Eventually-consistent assignments: systems/dicer. Prioritizes availability and fast recovery; Clerks/Slicelets may briefly disagree during transitions; applications layer their own correctness if needed.

Canonical implementations

Seen in

Last updated · 200 distilled / 1,178 read