Skip to content

VERCEL 2026-04-21 Tier 3

Read original ↗

Vercel — Preventing the stampede: Request collapsing in the Vercel CDN

Summary

Vercel's CDN launched request collapsing as a default behaviour for every ISR route on every deployment. When a cache entry expires (or was never written) and many requests arrive simultaneously for the same uncached path, the naive behaviour is that each request triggers its own function invocation to regenerate the page — a classic cache stampede that wastes compute and hammers the origin. Request collapsing synchronises the concurrent misses with a two-level distributed lock (per-node, then per-region) so that exactly one invocation per region regenerates the page; all other concurrent requests wait briefly and receive the cached response once it's populated.

The mechanism is framework-inferred, not user-configured: when a Next.js app is deployed, Vercel's build analyser classifies each route as ISR / SSG / dynamic, and the CDN only collapses requests for routes known to produce a cacheable, deterministic response. Dynamic API routes returning user-specific data, or pages with random content, are never collapsed. The implementation uses double-checked locking (check cache, acquire locks, re-check cache) plus explicit lock timeouts (~3 s) that let a request give up waiting and regenerate independently rather than pile up behind a slow invocation — a bounded-hedging policy that preserves collapsing in the common case while refusing to cascade failures when a function is slow.

Production numbers: the Vercel CDN currently collapses over 3M requests per day on cache miss, on top of 90M collapsed requests from background revalidations. The feature is enabled for every Vercel project; any customer using ISR benefits automatically with zero configuration. One production graph shows collapse rate jumping from 30 requests/sec to 120 requests/sec in a short window — i.e. burstiness of the underlying stampede problem is ~4× even at baseline.

Key takeaways

  • The cache stampede is the ISR failure mode being fixed. When a popular ISR route expires, dozens of concurrent requests each see an empty cache and each trigger their own function invocation. "Without coordination, each of those misses invokes the function independently. For a popular route, this can mean dozens of simultaneous invocations, all regenerating the same page. This wastes compute and hammers your backend." Canonicalised as concepts/cache-stampede (and instance of the broader concepts/thundering-herd family). (Source: sources/2026-04-21-vercel-preventing-the-stampede-request-collapsing-in-the-vercel-cdn)

  • Request collapsing is the structural fix. "When multiple requests hit the same uncached path, only one request per region invokes a function. The rest wait and get the cached response." Canonicalised as concepts/request-collapsing.

  • The lock topology is two-level: node then region. Each CDN node maintains an in-memory lock to serialise same-path requests within the node. Only after acquiring the node lock does a request contend for the regional lock that serialises across all nodes in the region. "Without the node-level grouping, hundreds of concurrent requests could all compete for the regional lock simultaneously. This would create a thundering herd problem where the lock coordination itself becomes a bottleneck." Canonicalised as concepts/two-level-distributed-lock — the node lock acts as a local funnel so the number of waiters at the regional level stays proportional to nodes-per-region, not total requests.

  • The correctness protocol is double-checked locking. Check cache → if miss, acquire locks → check cache again (someone else might have populated it while you waited) → only then invoke the function and set the cache. "Locking alone doesn't collapse requests. If every waiter invoked the function after getting the lock, work would still be duplicated." Canonicalised as concepts/double-checked-locking. Well-known concurrent-programming primitive adapted to the distributed-cache altitude.

  • Cache-write is asynchronous; lock-release is eager. "Notice that the cache is written asynchronously after the function returns. This allows the response to be sent back to the user without waiting for the cache set operation to complete, reducing the time to first byte. Meanwhile, the lock is released as soon as the cache is populated, so waiters can proceed quickly." Two latency optimisations stacked: TTFB minimised for the lock-holder (don't wait on cache-write), and waiter liveness maximised (release lock at cache-populated, not at response-sent).

  • Lock timeouts prevent cascading failures. "To prevent this, locks are created with explicit timeouts. If a request cannot acquire the lock within a fixed window (for example, a few seconds), it abandons waiting and proceeds to invoke the function itself." The sample code shows { timeout: 3000 } (3 seconds) on both node and regional locks. Canonicalised as concepts/lock-timeout-hedging — bounded-wait-then-hedge policy that trades extra function invocations in the slow-path for bounded tail latency. Paid cost is at worst the same as no-collapsing (everyone invokes); paid benefit is in the common case exactly-one invocation.

  • Cacheability is framework-inferred, not user-configured. "When you deploy your app, Vercel analyses your routes and understands which ones use ISR, static generation, or dynamic rendering. This metadata gets distributed to every CDN region. When a request arrives, the CDN already knows whether that specific route can be safely cached and collapsed. This happens without any configuration from you." Canonicalised as patterns/framework-inferred-cache-policy — the CDN learns per-route cache semantics from the Next.js build output rather than requiring explicit developer annotation.

  • Three cacheability classes are distinguished: (a) ISR page that regenerates the same content for all users — safe to collapse; (b) dynamic API route that returns user-specific data — cannot be collapsed (different responses per user); (c) page with random content or timestamps — should not be collapsed (response is non-deterministic). The "requires the response is cacheable and shared across users" constraint is exactly the correctness precondition for the collapsing idiom to hold.

  • Errors poison nothing; they just don't collapse. "If the function invocation throws an error, the result cannot be cached. That means the second cache lookup still returns nothing. The next request that acquires the lock must attempt regeneration again. In this case, collapsing does not help because there is no valid response to share, but the system still ensures that errors do not poison the cache." Explicit negative-result caching is off; error paths degrade to uncollapsed behaviour.

  • Production scale (April 2026): 3M+ requests per day collapsed on cache miss + 90M collapsed from background revalidations. Background revalidation accounts for ~30× the volume of on-miss collapsing — consistent with the idea that most ISR traffic hits the cache and gets revalidated in the background, not on a cold miss. Burstiness: one graph shows a short-window jump from 30 rps collapsed to 120 rps collapsed (4× spike).

  • Every ISR project on Vercel benefits by default. "This feature is enabled for all projects on Vercel, so any customer using ISR benefits from request collapsing automatically." No opt-in, no config, no migration — the framework-integration story carries the adoption cost.

CDN architecture extracted

The post gives the following cache-lookup hierarchy (hot → cold):

  1. Per-node in-memory cache — small, per-server-instance, serves frequently-requested content immediately.
  2. Per-region CDN cache (Vercel cache) — shared across nodes in a region; replicated from the ISR cache.
  3. ISR cache — global source of truth co-located with the functions; stores regeneration outputs; replicates to each region's Vercel cache.
  4. Function invocation — the user's Next.js server render, run only on a cold miss at all three cache layers.

Each region has multiple nodes (server instances) that scale with traffic; each node runs multiple workers for concurrent request handling; each node has its own in-memory cache. The two-level lock mirrors this node→region hierarchy exactly: lock per node first (funnel), then lock per region (authoritative serialisation).

Failure modes and policies

Failure Behaviour Rationale
Function throws error Cache not written; next lock holder retries Don't cache errors; collapsing can't help here
Function times out / is slow Other waiters time out after ~3 s on lock and invoke independently Bounded tail latency; accept redundant work in worst case
Cache populated while waiting Second cache check returns hit; skip regeneration Double-checked locking correctness invariant
Response is uncacheable No lock acquired; each request invokes Collapsing is only safe when response is shared

Operational numbers

  • Collapsed on cache miss: 3M+ requests/day
  • Collapsed on background revalidation: 90M+ requests/day
  • Total collapsed: ~93M/day (Apr 2026)
  • Background-to-miss ratio: ~30×
  • Lock timeout: 3,000 ms (per the code sample; both node + region)
  • Observed burst: 30 rps → 120 rps collapsed (4× short-window)
  • Adoption: 100% of Vercel projects using ISR (zero-config default)

Caveats

  • No latency distribution disclosed. The post doesn't give p50/p99 wait times for collapsed requests vs cache hits vs independent invocations. "Every request is either served from cache immediately or waits briefly for a lock holder to complete" is qualitative — "briefly" is not pinned down.
  • No cache-hit-rate improvement delta. The 3M/day collapsed figure is the absolute count; there's no disclosure of what the comparable uncollapsed-era number would have been, or what portion of total ISR traffic is collapsed vs passes straight through.
  • Cross-region behaviour under-specified. The post says "one invocation per region", implying multiple function invocations for the same path in different regions during a global cache miss. This is an architectural choice (regional blast-radius containment at the cost of per-region duplicate work on truly global cold misses), but the post doesn't discuss when global coalescing would be preferable.
  • Lock implementation not disclosed. The post doesn't name the distributed-lock substrate for the regional lock (Redis? purpose-built? leased from the routing layer?). Only the code-sample signatures are shown (createNodeLock, createRegionalLock, combineLocks).
  • Error-rate exemption window not specified. If many errors arrive in sequence, waiters accumulate and each retries; no circuit-breaker mentioned that would stop attempting after N consecutive errors on the same key.
  • "Regional lock" semantics: fair? FIFO? preemptive? Not specified. For a popular hot-key expiring, the first acquirer's latency directly determines collapsed-request tail latency.
  • Negative caching policy. The post says errors aren't cached. For certain class of errors (permanent 4xx from a broken route) caching might be desirable; the post doesn't distinguish.
  • Interaction with stale-while-revalidate. The post focuses on cache-miss collapsing and background-revalidation collapsing but doesn't explicitly discuss stale-while-revalidate semantics — whether a stale cached value is served to concurrent requests while one invocation refreshes, or whether everyone waits on the lock. The SWR case is likely how the 90M/day background number arises.
  • Product-voice post. The article is published on vercel.com/blog and ends with product-adjacent framing ("This feature is enabled for all projects on Vercel, so any customer using ISR benefits from request collapsing automatically"), though the body is architecturally substantive.

Source

Last updated · 476 distilled / 1,218 read