CONCEPT Cited by 1 source

Shuffle sharding¶

Definition¶

Shuffle sharding is a tenant-isolation technique that gives each tenant in a shared cluster a single-tenant experience by assigning them a randomly-chosen subset of backend nodes (the "shuffle set") rather than the whole fleet. If N backend nodes serve T tenants, each tenant is mapped to a K-node shuffle set where K ≪ N. With enough tenants, any two tenants' shuffle sets only partially overlap, so a single tenant's failure (DDoS, runaway query, bad deploy, hot key) can only take down the K nodes in their shuffle set — other tenants' requests continue to land on nodes unaffected by the bad tenant.

The technique was popularised by AWS's Builder Library; it's now the default "how do I isolate tenants without giving each one dedicated hardware?" answer in multi-tenant system design.

The math¶

Given N total nodes and a shuffle-set size of K, the number of possible shuffle sets is C(N, K). If two tenants are randomly assigned shuffle sets, the probability that their sets are disjoint is high even for modest N and K. Concretely: with N = 100 and K = 5, there are C(100, 5) ≈ 75M possible shuffle sets — two random tenants almost certainly do not share all K nodes.

Crucial corollary: if a bad tenant can take down their entire shuffle set, the blast radius for any other tenant is the probability that the other tenant's shuffle set is a subset of the bad tenant's — which is vanishingly small (C(K, K) / C(N, K)). Most tenants see zero impact from any single tenant's worst-case failure.

Where it's applied¶

Write path: incoming metrics / writes from tenant A hash only to tenant A's shuffle set of ingesters. A flood from tenant A's hot application saturates A's K ingesters; tenants B/C/D keep writing because their shuffle sets don't overlap.
Read path: queries from tenant A route to A's shuffle set of query workers. A runaway query (e.g., 500MB+ payload, see sources/2026-04-21-airbnb-building-a-fault-tolerant-metrics-storage-system) fills A's query-worker queue but leaves the rest idle for other tenants.
Stateful layers: shuffle-shard keyspace so even if some nodes are compromised / overloaded, each tenant's data is still present on at least one node in their shuffle set.

Relationship to other isolation mechanisms¶

Weaker than account-per-tenant / cluster-per-tenant (which are full isolation) but dramatically cheaper: K/N of the hardware footprint per tenant.
Stronger than no isolation: in a naive round-robin cluster, a noisy tenant can exhaust every node.
Composable with concepts/blast-radius reduction at the cluster level: Airbnb combines shuffle sharding within a cluster with a multi-cluster architecture across clusters, bounding blast radius at two altitudes simultaneously.
Composable with concepts/performance-isolation: per-tenant rate limits + shuffle sharding together mean a bad tenant can saturate their shuffle set but nothing more.

Airbnb's diagram¶

The Airbnb post visualises it with four shards and five tenants (A-E). If DDoS or other attacks from tenant A take out the first and second shards, tenants B & C (who happened to land on those shards) are also impacted — but tenants D & E's data remains available in the third and fourth shards. (Source: sources/2026-04-21-airbnb-building-a-fault-tolerant-metrics-storage-system)

Caveats / trade-offs¶

Per-tenant node assignment requires a tenant→shuffle-set map, typically consistent-hashed on a tenant ID + some seed.
Low tenant counts weaken the guarantee: with only 5 tenants and 10 nodes, shuffle sets overlap heavily. Works best at hundreds-to-thousands of tenants.
Data locality inside a shuffle set still needs thought — e.g., tenant-A reads need to find tenant-A writes, so the shuffle set has to be stable or reshuffles need to reconcile data placement.
Does not protect against cluster-wide failures (deploy bug, config push, control-plane outage) — for that, combine with a multi-cluster architecture (see concepts/active-multi-cluster-blast-radius).

Seen in¶

sources/2026-04-21-airbnb-building-a-fault-tolerant-metrics-storage-system — Airbnb's in-house metrics storage system uses shuffle sharding for both writes (tenants hash to a subset of ingesters) and reads (Grafana queries hash to a subset of query workers). The visual in the post shows how a bad-tenant outage on shards 1 & 2 affects some overlapping tenants but leaves the rest unaffected in shards 3 & 4.