CONCEPT Cited by 4 sources
Web Streams as SSR bottleneck¶
Web Streams as SSR bottleneck names the empirical finding — disclosed independently by Cloudflare and Vercel during 2025-2026 benchmark cycles — that the Web Streams API implementation (particularly on Node.js) is the dominant CPU cost for streaming server-side rendering, not the application code or the hosting platform's networking layer.
The finding¶
Under streaming SSR (React 18 renderToReadableStream, Next.js
App Router, SvelteKit), the request path chains multiple
TransformStream instances via pipeThrough(). Each
transform pass:
- Allocates buffers per chunk — up to 50 × 2048-byte
Bufferinstances per render in OpenNext's pipeline, even when most are unused (Cloudflare profiling, 2025-10-14). - Scans bytes for encoding detection and format conversions.
- Creates promises + intermediate objects at per-chunk rate
— Vercel's independent benchmark found
pipeThrough()at 630 MB/s vs Nodepipeline()at ~7,900 MB/s = 12× gap, attributed almost entirely to promise and object allocation overhead. - Drives garbage collection under load — Vercel's 2026-04-21 profiling found "garbage collection also consumed a significant share of total processing time under heavy load."
Load-bearing empirical disclosures¶
Vercel (2026-04-21)¶
Profiling showed that the main bottleneck in Node.js came from its Web Streams implementation and transform operations, where buffer scanning and data conversions added measurable CPU cost. Garbage collection also consumed a significant share of total processing time under heavy load.
Canonical measured consequence: 28 % TTLB reduction on CPU-bound Next.js rendering when switching Node.js → Bun, attributed to "Bun's optimized handling of web streams and reduced garbage collection overhead."
Cloudflare (2025-10-14)¶
OpenNext profiling on Workers surfaced the same class of cost:
pipeThrough()allocating 50 × 2048-byteBufferper request.Readable.toWeb(Readable.from(chunks))double-buffering (replaceable withReadableStream.from(chunks)).- Default
highWaterMark: 1on value-orientedReadableStreams causing per-byte reads instead of 4096-byte block coalescing.
Cloudflare (2026-02-27)¶
James Snell's post-mortem: "as one of the core maintainers of Node.js, I am looking forward to helping Malte and the folks at Vercel get their proposed improvements landed!" — frames Node's Web-Streams implementation as an industry-wide performance pain point both vendors are attacking upstream.
Why it's not just a micro-optimisation¶
Streaming SSR workloads have millions of chunks per hour per
instance under load. Per-chunk allocations compound into
significant portions of total CPU under sustained traffic.
The 12× pipeThrough() vs pipeline() gap is large enough
that choosing a runtime with a better Web-Streams
implementation (Bun) yields measurable user-facing latency
wins even on hot, GC-stable paths.
Mitigations¶
- Runtime choice — Bun's JavaScriptCore + Zig-based I/O implementation avoids the worst of Node's promise-allocation-heavy Web-Streams path (28 % TTLB win per Vercel).
- Node upstream improvements — Vercel's proposed
fast-webstreams work
targets ~10× gains by eliminating per-chunk promises.
Library now shipped + upstream PR landing (see
sources/2026-04-21-vercel-we-ralph-wiggumed-webstreams-to-make-them-10x-faster):
systems/fast-webstreams measures up to 14.6×
on the React Flight
byte-stream pattern; two ideas upstream in
nodejs/node#61807
delivering ~17-20 % faster buffered reads and
~11 % faster
pipeToto every Node.js user. - Adapter-level fixes — OpenNext's PRs to replace
Readable.toWeb(Readable.from(chunks))withReadableStream.from(chunks);pipeThrough()Buffer allocation fixes shipped upstream. - Alternative streaming APIs — Cloudflare's
new-streamsPOC explores a different API shape that could sidestep Web-Streams' allocation pattern entirely.
Measurement note¶
This bottleneck is only visible under TTLB, not TTFB — the shell flushes quickly regardless, so TTFB shows a small gap or none at all. See concepts/ttfb-vs-ttlb-ssr-measurement.
Seen in¶
- sources/2026-04-21-vercel-bun-runtime-on-vercel-functions — canonical Vercel-side disclosure. Profiling surfaced Web Streams + transform operations as the dominant cost; Bun's implementation yields 28 % TTLB reduction on CPU-bound Next.js rendering.
- sources/2025-10-14-cloudflare-unpacking-cloudflare-workers-cpu-performance-benchmarks
— canonical Cloudflare-side disclosure. Same benchmark
(
cf-vs-vercel-bench), independently profiled; surfacespipeThrough()Buffer allocation, double-buffering adapters, value-orientedReadableStreamwithhighWaterMark: 1. - sources/2026-02-27-cloudflare-a-better-streams-api-is-possible-for-javascript
— Snell's post-mortem framing. Quantifies 12× gap
(
pipeThrough()630 MB/s vspipeline()7,900 MB/s) and names promise-allocation overhead as dominant cause. - sources/2026-04-21-vercel-we-ralph-wiggumed-webstreams-to-make-them-10x-faster
— canonical library-level mitigation disclosure. Vercel's
fast-webstreams library ships
a spec-compliant replacement for the 3 Web Streams
constructors reaching up to 14.6× throughput on the
React Flight byte-stream
pattern (1,600 MB/s vs 110 MB/s native), 9.8× on chained
pipeThrough, 3.2× on fetch → 3 transforms. Two ideas already landed upstream via Matteo Collina's PR #61807 delivering ~17-20% buffered-read improvement to every Node.js user. Also canonicalises concepts/synchronous-fast-path-streaming, concepts/spec-compliant-optimization, concepts/microtask-hop-cost.
Related¶
- systems/web-streams-api — the WHATWG API at the centre of the issue.
- systems/nodejs — the runtime whose Web-Streams implementation is the profiled bottleneck.
- systems/bun — the alternative runtime whose implementation avoids the issue.
- systems/nextjs — the framework most visibly affected.
- systems/opennext — the adapter layer with the diagnosed allocation pattern.
- concepts/streaming-ssr — the workload shape that surfaces the bottleneck.
- concepts/promise-allocation-overhead — the dominant cost class within Web Streams.
- concepts/stream-adapter-overhead — the Node ↔ Web stream double-buffer cost.
- systems/fast-webstreams — the library-level mitigation (up to 14.6× on React Flight).
- systems/react-flight — the workload where the largest gap concentrates.
- concepts/synchronous-fast-path-streaming — the
per-
read()optimisation fast-webstreams lands upstream. - concepts/microtask-hop-cost — the scheduling cost per chunk that compounds at SSR scale.