Skip to content

CONCEPT Cited by 1 source

Request collapsing

Definition

Request collapsing is the CDN/cache behaviour in which N concurrent requests for the same uncached (and cacheable) resource are deduplicated into a single upstream invocation: one request proceeds to regenerate the resource, and the other N − 1 requests wait briefly until the cache is populated and then receive the same response. The canonical invariant is:

k concurrent misses on the same key result in ≤ 1 upstream invocation — not k.

This is the cache-layer cousin of Vitess query consolidation, Go's singleflight primitive, and request-coalescing at the RPC layer; all three apply the same idiom at different altitudes.

Why you need it

Without collapsing, a popular cached resource hitting a miss (TTL expiry, new deploy, cold region) produces a cache stampede — every arriving request sees the same empty cache, triggers its own upstream invocation, and the origin takes the full fan-in concurrency. For an ISR page served from a Vercel CDN node, this means "dozens of simultaneous [function] invocations, all regenerating the same page" (Source: sources/2026-04-21-vercel-preventing-the-stampede-request-collapsing-in-the-vercel-cdn).

Collapsing flips the cost model:

Dimension Without collapsing With collapsing
Function invocations per burst of N concurrent misses N 1
Origin load under steady-state miss bursts O(requests/sec) O(unique keys missing/sec)
Tail latency for the (N-1) waiters Independent invoke latency Lock wait + one invoke's latency
Cache coherency N writers race; last-writer-wins Single writer; coherent

Correctness preconditions

Collapsing is only safe when the response is shared across all waiters — i.e. the request is cacheable and deterministic for the cache key. The Vercel post is explicit:

  • ISR page that regenerates the same content for all users — safe to collapse ✓
  • Dynamic API route that returns user-specific data — cannot be collapsed ✗ (each request would get the wrong user's data)
  • Page with random content or timestamps — should not be collapsed ✗ (non-determinism would hide behind the cache)

This is the same precondition any shared-cache architecture needs: the cache key must functionally determine the response. Violating it produces silent correctness bugs (users see each other's data), not just performance regressions.

Implementation topology

The standard implementation requires three primitives:

  1. A lock keyed by cache key. So that multiple concurrent requests for different keys don't serialise against each other.
  2. Double-checked locking. Check the cache both before and after acquiring the lock, so a waiter can skip its own invocation if someone else's has already completed.
  3. A lock timeout. So a slow or hung upstream doesn't block all waiters forever — past some threshold, waiters give up and invoke independently.

The Vercel CDN layers (2) and (3) on a two-level lock (node, then region) for scalability: node locks funnel per-server contention down to at most one contender per node, then the regional lock serialises across nodes in the region.

Regional vs global scope

Request collapsing naturally operates at a scope boundary: one invocation per {node, region, or globe}. Vercel chose one invocation per region — accepting that a globally uncached ISR page might trigger ~one function invocation per region on a simultaneous miss. The tradeoff:

  • Per-region collapsing: regenerations are fast (regional function) and blast-radius of a bad invocation is regional; but N regions → N invocations on a truly global cold miss.
  • Global collapsing: at most one invocation on any simultaneous miss anywhere on Earth; but requires cross-region coordination (slow, extra failure mode) and a bad invocation's result propagates globally.

Vercel's choice is consistent with their broader edge-cache design (replicate the ISR cache to each region's local cache, serve locally) — the regional lock matches the regional cache boundary.

Mechanism variants

  • Request coalescing / singleflight (Go's golang.org/x/sync/singleflight) — in-process, single-node variant. N concurrent calls to the same key returns one underlying invocation's result to all callers. The node-level lock in Vercel's design is essentially singleflight per-node.
  • Query consolidation (Vitess) — same idiom at the SQL wire-protocol altitude, merging identical in-flight queries at the proxy. See concepts/query-consolidation and patterns/consolidate-identical-inflight-queries.
  • stale-while-revalidate (RFC 5861) — a different failure- mode answer: serve stale cached content to all callers while one invocation regenerates in the background. Collapsing focuses on the "cache-completely-cold" case; SWR focuses on the "cache-expired-but-present" case. Vercel applies both: 90M/day collapsed on background revalidation (SWR path) vs 3M/day collapsed on cold miss (collapsing path).
  • Request coalescing at the RPC layer — e.g. Netflix's Zuul filters, Envoy's request hedging. Generally applied when upstream is slow; the primitive is the same.

Failure modes

Error in the one invocation. Errors aren't cached, so the second cache check still misses after lock release → the next waiter to acquire the lock retries. Collapsing doesn't help on errors but doesn't poison the cache either.

Slow invocation. Without lock timeouts, waiters pile up indefinitely — a slow invocation on a popular key turns into a pseudo-outage for that route. The fix is bounded-wait-then-hedge: past a timeout, a waiter gives up and invokes itself. Vercel uses 3,000 ms.

Uncacheable response. A route wrongly classified as cacheable would cause users to see each other's data. Vercel mitigates this via framework-inferred cache policy — the build analyser classifies each route, so developers don't have to tag them manually.

Hot-key lock contention. Even with two-level locking, a truly popular key during a cold-cache event concentrates all regional waiters on one lock. The node-level funnel bounds the waiter count, but the single regenerator is always the bottleneck's critical path.

Canonical production instance

Vercel CDN (April 2026): (sources/2026-04-21-vercel-preventing-the-stampede-request-collapsing-in-the-vercel-cdn)

  • 3M+ requests/day collapsed on cache miss
  • 90M+ requests/day collapsed on background revalidation
  • One invocation per region on miss
  • Two-level lock (in-memory node lock + regional distributed lock)
  • Double-checked locking between lock acquire and invocation
  • 3,000 ms timeout on both locks
  • Zero-config: enabled for every Vercel project using ISR, because route cacheability is inferred from the Next.js build output

Seen in

Last updated · 476 distilled / 1,218 read