CONCEPT Cited by 4 sources

Web Streams as SSR bottleneck¶

Web Streams as SSR bottleneck names the empirical finding — disclosed independently by Cloudflare and Vercel during 2025-2026 benchmark cycles — that the Web Streams API implementation (particularly on Node.js) is the dominant CPU cost for streaming server-side rendering, not the application code or the hosting platform's networking layer.

The finding¶

Under streaming SSR (React 18 renderToReadableStream, Next.js App Router, SvelteKit), the request path chains multiple TransformStream instances via pipeThrough(). Each transform pass:

Allocates buffers per chunk — up to 50 × 2048-byte Buffer instances per render in OpenNext's pipeline, even when most are unused (Cloudflare profiling, 2025-10-14).
Scans bytes for encoding detection and format conversions.
Creates promises + intermediate objects at per-chunk rate — Vercel's independent benchmark found pipeThrough() at 630 MB/s vs Node pipeline() at ~7,900 MB/s = 12× gap, attributed almost entirely to promise and object allocation overhead.
Drives garbage collection under load — Vercel's 2026-04-21 profiling found "garbage collection also consumed a significant share of total processing time under heavy load."

Load-bearing empirical disclosures¶

Vercel (2026-04-21)¶

Profiling showed that the main bottleneck in Node.js came from its Web Streams implementation and transform operations, where buffer scanning and data conversions added measurable CPU cost. Garbage collection also consumed a significant share of total processing time under heavy load.

Canonical measured consequence: 28 % TTLB reduction on CPU-bound Next.js rendering when switching Node.js → Bun, attributed to "Bun's optimized handling of web streams and reduced garbage collection overhead."

Cloudflare (2025-10-14)¶

OpenNext profiling on Workers surfaced the same class of cost:

pipeThrough() allocating 50 × 2048-byte Buffer per request.
Readable.toWeb(Readable.from(chunks)) double-buffering (replaceable with ReadableStream.from(chunks)).
Default highWaterMark: 1 on value-oriented ReadableStreams causing per-byte reads instead of 4096-byte block coalescing.

Cloudflare (2026-02-27)¶

James Snell's post-mortem: "as one of the core maintainers of Node.js, I am looking forward to helping Malte and the folks at Vercel get their proposed improvements landed!" — frames Node's Web-Streams implementation as an industry-wide performance pain point both vendors are attacking upstream.

Why it's not just a micro-optimisation¶

Streaming SSR workloads have millions of chunks per hour per instance under load. Per-chunk allocations compound into significant portions of total CPU under sustained traffic. The 12× pipeThrough() vs pipeline() gap is large enough that choosing a runtime with a better Web-Streams implementation (Bun) yields measurable user-facing latency wins even on hot, GC-stable paths.

Mitigations¶

Runtime choice — Bun's JavaScriptCore + Zig-based I/O implementation avoids the worst of Node's promise-allocation-heavy Web-Streams path (28 % TTLB win per Vercel).
Node upstream improvements — Vercel's proposed fast-webstreams work targets ~10× gains by eliminating per-chunk promises. Library now shipped + upstream PR landing (see sources/2026-04-21-vercel-we-ralph-wiggumed-webstreams-to-make-them-10x-faster): systems/fast-webstreams measures up to 14.6× on the React Flight byte-stream pattern; two ideas upstream in nodejs/node#61807 delivering ~17-20 % faster buffered reads and ~11 % faster pipeTo to every Node.js user.
Adapter-level fixes — OpenNext's PRs to replace Readable.toWeb(Readable.from(chunks)) with ReadableStream.from(chunks); pipeThrough() Buffer allocation fixes shipped upstream.
Alternative streaming APIs — Cloudflare's new-streams POC explores a different API shape that could sidestep Web-Streams' allocation pattern entirely.

Measurement note¶

This bottleneck is only visible under TTLB, not TTFB — the shell flushes quickly regardless, so TTFB shows a small gap or none at all. See concepts/ttfb-vs-ttlb-ssr-measurement.

Seen in¶

sources/2026-04-21-vercel-bun-runtime-on-vercel-functions — canonical Vercel-side disclosure. Profiling surfaced Web Streams + transform operations as the dominant cost; Bun's implementation yields 28 % TTLB reduction on CPU-bound Next.js rendering.
sources/2025-10-14-cloudflare-unpacking-cloudflare-workers-cpu-performance-benchmarks — canonical Cloudflare-side disclosure. Same benchmark (cf-vs-vercel-bench), independently profiled; surfaces pipeThrough() Buffer allocation, double-buffering adapters, value-oriented ReadableStream with highWaterMark: 1.
sources/2026-02-27-cloudflare-a-better-streams-api-is-possible-for-javascript — Snell's post-mortem framing. Quantifies 12× gap (pipeThrough() 630 MB/s vs pipeline() 7,900 MB/s) and names promise-allocation overhead as dominant cause.
sources/2026-04-21-vercel-we-ralph-wiggumed-webstreams-to-make-them-10x-faster — canonical library-level mitigation disclosure. Vercel's fast-webstreams library ships a spec-compliant replacement for the 3 Web Streams constructors reaching up to 14.6× throughput on the React Flight byte-stream pattern (1,600 MB/s vs 110 MB/s native), 9.8× on chained pipeThrough, 3.2× on fetch → 3 transforms. Two ideas already landed upstream via Matteo Collina's PR #61807 delivering ~17-20% buffered-read improvement to every Node.js user. Also canonicalises concepts/synchronous-fast-path-streaming, concepts/spec-compliant-optimization, concepts/microtask-hop-cost.

systems/web-streams-api — the WHATWG API at the centre of the issue.
systems/nodejs — the runtime whose Web-Streams implementation is the profiled bottleneck.
systems/bun — the alternative runtime whose implementation avoids the issue.
systems/nextjs — the framework most visibly affected.
systems/opennext — the adapter layer with the diagnosed allocation pattern.
concepts/streaming-ssr — the workload shape that surfaces the bottleneck.
concepts/promise-allocation-overhead — the dominant cost class within Web Streams.
concepts/stream-adapter-overhead — the Node ↔ Web stream double-buffer cost.
systems/fast-webstreams — the library-level mitigation (up to 14.6× on React Flight).
systems/react-flight — the workload where the largest gap concentrates.
concepts/synchronous-fast-path-streaming — the per-read() optimisation fast-webstreams lands upstream.
concepts/microtask-hop-cost — the scheduling cost per chunk that compounds at SSR scale.