CONCEPT Cited by 2 sources
Event-loop blocking in single-threaded services¶
Definition¶
Event-loop blocking is the failure mode where a single-threaded runtime (Node.js, V8 isolates, LuaJIT / OpenResty, CPython's GIL-bound loop, a UI thread) spends wall-clock time in a synchronous CPU-bound computation that monopolises the single execution thread, so that every other pending task — including every concurrent request — is stalled for the entire duration.
The stall isn't about CPU scheduling between threads (there isn't any); it's about one request's CPU work directly blocking every other request's chance to make progress. It is a form of head-of-line blocking at the runtime-scheduler altitude.
Why single-threaded services are vulnerable¶
Multi-threaded services have a natural backpressure: a slow
request ties up one worker thread, and the pool sizes so that
the slowest k requests block at most k workers. Queue
time grows but in-flight requests continue to make progress.
A single-threaded service has no worker pool. A 250 ms
synchronous JSON.parse() blocks:
- All requests routed during that 250 ms window.
- All async callbacks that were scheduled to run (timers, I/O completions, incoming sockets).
- All tasks that would have otherwise been cooperatively
scheduled via
setImmediate/process.nextTick/yield.
The design is explicitly "cooperative" — the runtime assumes every callback yields back quickly — and blocking breaks the cooperation invariant.
Canonical Vercel disclosure¶
Vercel's 2026-04-21 Bloom-filter post makes the causality explicit:
"Given that our routing service is single-threaded, parsing this JSON file also blocks the event loop. This means that for those websites whose path lookups take 250 milliseconds to parse, it literally takes 250 milliseconds longer to serve the website while we wait for the operation to finish."
(Source: sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters.)
The 15 % memory / CPU-usage drop after Bloom-filter rollout is evidence the heavy-site parse cost was cross-polluting all routing traffic on the same reactor, not just slowing the heavy sites themselves.
Failure-mode shape¶
Under event-loop blocking you see:
- p50 looks fine, because most requests don't hit the blocker directly. They get queued behind it and dequeue into the next available slot.
- p99 and p999 spike, because those are the requests that arrived just before or during a heavy blocking operation. Their wall-clock latency includes the full blocking duration.
- Heap and GC pressure rise in lockstep because the blocking operation is typically allocation-heavy (JSON parse allocates every object; large-file reads allocate every buffer).
- Fixing the blocker improves latency on requests that never touched it — a canonical "slow is failure" and cross-tenant-noisy-neighbour signal at the runtime altitude.
Mitigations¶
1. Replace the blocking substrate¶
The load-bearing move. Don't parse a 1.5 MB JSON file on the hot path. Use a substrate whose construction is bounded by I/O, not by parse cost. Vercel's case: replace JSON-tree path lookup with Bloom-filter membership (construction is file-read-bound, query is constant-time per-key).
2. Move the blocking work off the event loop¶
- Worker threads (Node.js
worker_threads, Web Workers) — spawn a pool; dispatch blocking work via message-passing. - Streaming parse — parse incrementally (per-chunk) instead
of buffer-and-parse-all.
JSONStreamand similar libraries yield back to the loop between chunks. - Offload to a separate process — parse once per deployment; cache the parsed form; subsequent requests hit the cache.
3. Cache the parsed result¶
If the parse is one-per-deployment (not one-per-request) and the parsed form fits in memory, parse once at deploy-notify time and keep the parsed tree in a per-deployment hot map. This is the "fix the frequency, not the cost" move.
4. Yield cooperatively¶
await / setImmediate / queueMicrotask at natural
boundaries so the loop can serve other requests between
chunks of blocking work. Only helps if the work is naturally
chunkable.
5. Accept the single-threaded constraint and scale horizontally¶
Run N instances of the service behind a load balancer; each instance still blocks during its own expensive operation, but concurrent tenants are distributed across instances.
Related failure modes¶
- concepts/web-streams-as-ssr-bottleneck — dual at
the streaming-SSR altitude:
TransformStream.pipeThrough()chains accumulate per-chunk CPU cost in JavaScript runtimes' Web Streams implementation. - concepts/tail-latency-at-scale — the aggregate phenomenon of which event-loop blocking is one micro-cause.
- GC pause — event-loop blocking's sibling: the runtime pauses everything to collect garbage. Identical shape, but triggered by the runtime not by user code.
- Synchronous filesystem / DNS calls —
fs.readFileSync,dns.lookupSync, synchronous crypto — all block the loop; the Node.js docs have been flagging them for decades.
Diagnosis¶
The canonical diagnostic kit:
- Event-loop lag metric — the delay between a scheduled
timer firing and actually running. Under blocking, lag
spikes to the blocker duration. See
perf_hooks.monitorEventLoopDelay()on Node.js. - Async-hooks profiling — attribute blocking CPU to the specific callback / stack.
- Flame-graph profiling —
0x,perf,clinic.js flame— the blocking callback dominates the flame graph. - Per-tenant parse time — plot parse duration against tenant identifier; a long-tail tenant will stand out.
Vercel's canonical diagnosis sequence: notice that p99 / p999 routing-service latency was dominated by a small number of heavy sites' path-lookup parse cost → identify the single- threaded propagation → redesign the substrate.
Seen in¶
-
sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters — Canonical wiki instance. Vercel's single-threaded routing service was blocking its event loop for 250 ms per heavy- site request during JSON-parse of a 1.5+ MB path-lookup file; the blocking directly degraded latency for every concurrent request on the same reactor. Bloom-filter substitution collapsed parse time to near-zero, and the routing service's aggregate memory / CPU dropped 15 % because the heavy-site parse cost had been stealing reactor time from everyone.
-
sources/2026-04-21-vercel-bun-runtime-on-vercel-functions — Same runtime-altitude problem, different substrate. Node.js's Web Streams + Transform implementations dominate SSR CPU cost; Bun's faster implementation buys a 28 % latency reduction on the same request path.
-
systems/nodejs — The canonical single-threaded event- loop runtime most vulnerable to this pattern.
-
systems/bun — Sibling runtime with partial mitigation via faster substrate implementations (Zig-level I/O and scheduling).