Skip to content

CONCEPT Cited by 3 sources

Thundering herd

Definition

A thundering herd is a failure mode where a resource is overwhelmed by too many simultaneous requests, typically because many clients were previously blocked / disconnected / idle and are now all released at the same instant. The resource has no partial route — there is nowhere for load to spill — and either tips over or adds enough latency to chain into downstream failures.

The metaphor, per Figma's LiveGraph post (sources/2026-04-21-figma-keeping-it-100x-with-real-time-data-at-scale): "The name is derived from the vision of a huge herd of bulls coming at you; there's nowhere to run."

Classic shapes

  • Cache-cold-after-deploy / cache-restart stampede. A tier-2 cache is wiped (deploy, crash, flush). All clients that previously hit the cache now simultaneously hit the backing store. If the backing store's capacity was sized for cached-traffic-plus-slack, it tips over.
  • Reconnection stampede. A WebSocket tier drops many connections; reconnect retries are synchronised (all clients see the outage at the same instant) and hit the connection-establishment path simultaneously. Figma's FigCache post (sources/2026-04-21-figma-figcache-next-generation-data-caching-platform) names this as a structural Redis-connection problem pre-FigCache: "Thundering-herd connection establishment whenever client services scaled out quickly — bottlenecking I/O, degrading availability."
  • Lock contention on a single popular key. Many readers arrive for the same expired key → all miss the cache → all start the same DB fetch → duplicate work and serialized back-pressure.
  • Cron-second alignment. Every client runs the same job at 0 * * * *. Server load spikes at :00 regardless of hourly average.

Why it's a structural problem (not an operational one)

Thundering herd is not a capacity problem in the usual sense — average capacity is fine, concurrency peak is what breaks. It's a synchronization failure: something aligned all clients' "release" moments. That synchronizer is typically:

  • A deploy — everyone reconnects/retries at once.
  • A shared cache boundary — everyone expires / cold-starts the cache at once.
  • A shared schedule — everyone's cron fires at :00.
  • A shared outage recovery — everyone's retry timer hits T + max at the same moment.

Solutions therefore don't increase capacity; they de-synchronize the clients, or remove the boundary that aligns them.

Mitigations (by failure shape)

Cache wipe on deploy

  • Deploy the cache separately from the front-end. Figma's LiveGraph 100x fix: the old in-server cache stampeded on every LiveGraph deploy; the new architecture puts the cache in a separate tier so the edge can redeploy without wiping caches (systems/livegraph, patterns/independent-scaling-tiers).
  • Hot replicas on standby. Figma's new LiveGraph cache keeps warm replicas ready; during deploys, traffic flips to replicas without cold-starting the primary.
  • Request coalescing ("singleflight"). If N clients miss the same key simultaneously, only one fetch runs; the other N-1 coalesce onto it. Figma's LiveGraph rendezvous layer makes this explicit.
  • Warm-up scripts before cutover.

Reconnection stampede

  • Exponential backoff with jitter — canonical AWS Architecture Blog / "Exponential Backoff And Jitter".
  • Connection multiplexing — a shared proxy tier holds few persistent upstream connections on behalf of many client connections, so client-fleet fan-in doesn't map 1:1 to upstream connection establishment. systems/figcache is the canonical wiki instance; order-of-magnitude drop in Redis cluster connection counts post-rollout.
  • Client-scale-out rate limits — slow how fast a fleet can scale out, so connection establishment doesn't saturate.

Hot-key expiry stampede

  • Probabilistic early expiration — re-fetch a small % of requests before TTL expires, so expiry is amortised across requests, not concentrated at the instant TTL hits.
  • Stale-while-revalidate — serve stale data while background refetch runs; only the first miss pays latency.

Cron alignment

  • Jitter the schedule per client (0 * * * *random(0-59) * * * *).
  • Batch at the server rather than having N clients each call home.

Named production incidents

Structural defence: don't let a shared boundary align clients

The strongest mitigation is to remove the synchronizer:

  • Decouple cache tier from front-end deploy — Figma LiveGraph's move.
  • Hold the expensive upstream connection in a separate tier that scales on its own axis, not with the client fleet — Figma FigCache's move.
  • Isolate per-tenant cadence so one tenant's outage doesn't sync the other tenants' retries.
  • Pre-warm before traffic shifts, so the cutover isn't the first traffic event.

This is the general shape of patterns/independent-scaling-tiers for caches.

Seen in

  • canonical database-proxy-tier instance. Jarod Reyes (PlanetScale, 2021-09-30) names the specific thundering-herd shape where a slow hot-row SELECT causes cascading database outage: "Often, the outages we see from customers who were on NoSQL or RDS databases are cascading outages due to an initial spike in query response times." Each arriving caller on the slow hot row occupies a separate upstream connection for its own full execution, so upstream pool occupancy scales with total callers not unique queries. Reyes canonicalises the Vitess query-consolidation primitive + consolidate-identical-in-flight-queries pattern as the proxy-tier structural fix: merge identical simultaneously-arriving queries into one upstream execution, fan the result back to all waiting callers, cap upstream pool pressure at O(unique queries in flight) rather than O(total callers). Sister-primitive to concepts/connection-multiplexing at the cache-tier (Figma FigCache) and concepts/read-invalidation-rendezvous at the LiveGraph altitude — all three address the same structural class at different altitudes.
  • sources/2026-04-21-figma-keeping-it-100x-with-real-time-data-at-scale — explicit definition (the bull-herd metaphor) and canonical cache-wipe-on-deploy incident shape; structural fix = separate cache tier + hot standbys.
  • sources/2026-04-21-figma-figcache-next-generation-data-caching-platform — reconnection-stampede shape eliminated by connection multiplexing + drop-in RESP proxy (systems/figcache); named as a pre-FigCache scaling limit.
  • canonical async-job-framework instance. Mike Coutermarsh (PlanetScale, 2022-02-17) names the thundering-herd shape where a paired scheduler bulk- enqueues 10,000 jobs that each call the same external API: without jitter, workers drain the queue as fast as they can and all 10,000 requests arrive at the downstream within seconds. Structural fix = jittered scheduling via CleanUpJob.perform_with_jitter(id, max_wait: 30.minutes) which attaches a random rand(0..max_wait) delay per job so execution spreads over the window. Same structural shape as cache-wipe stampedes (many clients released simultaneously) at a different altitude (outbound-to-downstream instead of inbound-to-cache).
  • sources/2026-04-21-vercel-preventing-the-stampede-request-collapsing-in-the-vercel-cdncanonical CDN-altitude instance. Vercel frames the ISR cache-expiry stampede explicitly: "Picture a page that just recently expired, or a new route getting hit for the first time. Multiple users request it simultaneously. Each request sees an empty cache and triggers a function invocation. […] For a popular route, this can mean dozens of simultaneous invocations, all regenerating the same page." Canonicalised as the child concept concepts/cache-stampede. Structural fix = per-region request collapsing with a two-level (node + regional) lock — the node lock is explicitly there to prevent the regional-lock acquisition itself from becoming a thundering herd: "Without the node-level grouping, hundreds of concurrent requests could all compete for the regional lock simultaneously. This would create a thundering herd problem where the lock coordination itself becomes a bottleneck." I.e. Vercel's design names TH as the failure mode both at the cache-miss layer (the problem) and at the naive-lock layer (the failure mode of a careless fix). Production numbers: 3M+/day collapsed on cache miss + 90M+/day on background revalidation, 100% of ISR projects auto-enrolled via framework-inferred cache policy. See systems/vercel-cdn for the full system.

  • canonical benchmark-workload-design instance. Liz van Dijk (PlanetScale, 2022-09-08) names thundering herd as an explicit stressor that TAOBench is designed to simulate, via its objects + edges schema (concepts/social-graph-objects-and-edges): "Think of what happens when something goes viral: a thundering herd of users comes through to interact with a specific piece of content posted somewhere. On the database level, beyond a sudden surge in connections, this can also translate into various types of locks centered around the backing rows for that piece, which can have rippling effects that ultimately translate to slower content access times for the users on the platform." TAOBench is the first benchmark on this wiki that measures substrate thundering-herd response by design — distinct from sysbench-tpcc's shard-key-aligned workload where no row attracts disproportionate concurrent traffic. The load-bearing framing pairs thundering herd with concepts/hot-row-problem as the two stressors viral content creates: the row-level contention (hot row) and the connection/lock fanout (thundering herd) are distinct failure modes that the social-graph workload exercises together.

Last updated · 542 distilled / 1,571 read