Skip to content

CONCEPT Cited by 1 source

Double-checked locking

Definition

Double-checked locking is a concurrency idiom that avoids duplicating expensive work when multiple threads / requests / processes compete to do the same thing. The structure is:

  1. First check — read the shared state. If it already holds a valid value, return immediately. No lock taken, no expense paid.
  2. Acquire lock. Serialise with other potential workers.
  3. Second check — re-read the shared state under the lock. Another worker may have populated it while this one was waiting for the lock.
  4. Do the work — only if the second check still shows no value. Store the result, then release the lock.

The idiom is named for the two state checks around the lock. It is the standard answer to "serialising workers is necessary for correctness, but waking up to find the work already done is also necessary for throughput."

Why two checks?

The naive one-check form (check → lock → do work) is correct but redundant: every waiter entering the lock does the work, even if the first one already completed it. The improved two-check form (check → lock → check → work) skips the work when a prior waiter finished it during the lock wait.

Without the second check, a collapsing implementation still serialises callers but does not collapse work — each waiter invokes after acquiring the lock. Vercel explicitly frames this:

"Locking alone doesn't collapse requests. If every waiter invoked the function after getting the lock, work would still be duplicated." (Source: sources/2026-04-21-vercel-preventing-the-stampede-request-collapsing-in-the-vercel-cdn)

The second check is therefore the deduplication step — it's what makes collapsing actually collapse.

Pseudo-code

async function respond(request) {
  const cacheKey = getCacheKey(request);

  // First check — fast path on cache hit
  const cached1 = await cache.get(cacheKey);
  if (cached1) return cached1;

  // Acquire distributed lock keyed by cacheKey
  const lock = createDistributedLock(cacheKey);
  await lock.acquire();
  try {
    // Second check — someone else may have populated while we waited
    const cached2 = await cache.get(cacheKey);
    if (cached2) return cached2;

    // Do the work — only path that actually regenerates
    const response = await invokeFunction(request);
    cache.set(cacheKey, response); // fire-and-forget async
    return response;
  } finally {
    lock.release();
  }
}

Source: adapted from Vercel's request-collapsing post (sources/2026-04-21-vercel-preventing-the-stampede-request-collapsing-in-the-vercel-cdn).

Historical context

The idiom dates to lazy-singleton initialisation in C++ and Java — "check-then-initialise-then-publish". The classic Java form had a famous bug pre-JDK 5: without volatile, the store of the reference could be reordered such that another thread saw a non-null but partially-constructed object. The Java memory model fix (volatile + publication guarantees) is the concurrent- programming-textbook lesson; Doug Lea's JSR-133 and "Java Concurrency in Practice" cover it in detail.

At the distributed altitude, the analogous hazard is cache write visibility: if the cache set isn't synchronously visible before lock release, a waiter could acquire the lock, re-check the cache, see a stale miss, and redundantly invoke. Vercel's implementation sidesteps this by holding the lock until the cache is populated ("the lock is released as soon as the cache is populated, so waiters can proceed quickly") — cache write happens-before lock release, waiters always see the populated value.

At the distributed-cache altitude

Adapting DCL from shared memory to distributed caches requires three extra considerations:

  • Lock substrate. The lock must be distributed (Redis, ZooKeeper, purpose-built). The Vercel post doesn't name the substrate but the shape is clear: per-key, per-region, timeout-bounded.
  • Cache-write ordering vs lock release. The cache write must be visible to other lock acquirers before they re-check. Easiest: populate cache first, release lock second. Vercel does this (lock released "as soon as the cache is populated").
  • Lock timeouts. A distributed lock can be lost (network partition, holder crash). DCL must tolerate the second check seeing an empty cache for legitimate reasons (prior holder crashed mid-invocation) and proceed to invoke; combined with lock timeout hedging, this degrades gracefully instead of piling up indefinitely.

Common mistakes

  • Omitting the second check. The most common implementation bug. Yields a correct serialising lock but not a deduplicating one — every waiter still does the work.
  • Caching errors. If the first invocation fails, the cache stays empty; the next lock acquirer's second check still shows empty; it retries. This is desirable for transient errors, but a permanent failure (e.g. a 4xx from a broken route) can endlessly retry. Explicit negative caching can help here — but note the Vercel post explicitly chooses not to cache errors.
  • Reading from cache before all writes settle. If the cache is write-behind, the second check might race with a concurrent set. Strongly-ordered caches or synchronous cache writes required.
  • Long-lived lock. Holding the lock across the full response send (not just the cache populate) delays waiter progress. Vercel explicitly doesn't do this: the response is sent without waiting for the cache set to complete, but the lock is released at cache-populated. Two different "as soon as possible" release points for two different latency goals.

Canonical production instance

Vercel CDN request collapsing (April 2026) uses DCL at the per-region distributed-lock altitude, as the correctness layer on top of the two-level lock. The first check is the cache lookup before lock acquisition; the second check is after lock acquisition; cache population triggers lock release. (sources/2026-04-21-vercel-preventing-the-stampede-request-collapsing-in-the-vercel-cdn)

Seen in

Last updated · 476 distilled / 1,218 read