Skip to content

PATTERN Cited by 1 source

Shard replication for hot keys

Shard replication for hot keys is the auto-sharder pattern for concepts/hot-key relief: when a single key (or small set of keys) attracts disproportionate load, isolate it into its own shard/slice and replicate that slice across multiple pods. Incoming requests for that key are then load-spread across replicas.

Mechanics

Given a range-based auto-sharder (e.g. systems/dicer):

  1. Per-key load telemetry is reported asynchronously by server-side libraries (Slicelet in Dicer's case).
  2. The controller (Assigner) detects a key whose load exceeds a threshold.
  3. The slice containing that key is split so the hot key occupies its own one-key slice.
  4. That slice is replicated — assigned to N pods instead of 1.
  5. Clients' Clerks receive the new assignment; subsequent lookups for that key can return any of the replica pods (typically round-robin or load-aware).
  6. When load subsides, the slice is dereplicated and potentially merged back.

Dicer's example: user ID 42 → SliceKey K10 is isolated and assigned to P1 and P2 (Source: sources/2026-01-13-databricks-open-sourcing-dicer-auto-sharder).

Preconditions

  • The service must tolerate multiple pods serving the same key — i.e., either reads-only per replica, idempotent writes, or a coordination layer for writes. Under Dicer's concepts/eventual-consistency model, brief double-write windows are also possible and must be handled.
  • The system must support slice-level replication (splits alone aren't enough).
  • Per-key load signal must be cheap — Dicer aggregates it locally in the Slicelet and reports summaries asynchronously, off the request path.

Contrast: "spread the hot tenant across the fleet"

A tempting but different anti-pattern (seen in storage at systems/aws-ebs) is to widen a hot tenant's placement across all pods in a shared-resource system. That widens the noisy-neighbor blast radius (concepts/noisy-neighbor) — other tenants' tails get worse because the hot tenant's traffic is everywhere.

Shard replication is different: the hot slice is replicated onto a small set of dedicated replica pods. Load is spread, but the blast radius is bounded.

Seen in

Last updated · 200 distilled / 1,178 read