Skip to content

CONCEPT Cited by 4 sources

Web Streams as SSR bottleneck

Web Streams as SSR bottleneck names the empirical finding — disclosed independently by Cloudflare and Vercel during 2025-2026 benchmark cycles — that the Web Streams API implementation (particularly on Node.js) is the dominant CPU cost for streaming server-side rendering, not the application code or the hosting platform's networking layer.

The finding

Under streaming SSR (React 18 renderToReadableStream, Next.js App Router, SvelteKit), the request path chains multiple TransformStream instances via pipeThrough(). Each transform pass:

  • Allocates buffers per chunk — up to 50 × 2048-byte Buffer instances per render in OpenNext's pipeline, even when most are unused (Cloudflare profiling, 2025-10-14).
  • Scans bytes for encoding detection and format conversions.
  • Creates promises + intermediate objects at per-chunk rate — Vercel's independent benchmark found pipeThrough() at 630 MB/s vs Node pipeline() at ~7,900 MB/s = 12× gap, attributed almost entirely to promise and object allocation overhead.
  • Drives garbage collection under load — Vercel's 2026-04-21 profiling found "garbage collection also consumed a significant share of total processing time under heavy load."

Load-bearing empirical disclosures

Vercel (2026-04-21)

Profiling showed that the main bottleneck in Node.js came from its Web Streams implementation and transform operations, where buffer scanning and data conversions added measurable CPU cost. Garbage collection also consumed a significant share of total processing time under heavy load.

Canonical measured consequence: 28 % TTLB reduction on CPU-bound Next.js rendering when switching Node.js → Bun, attributed to "Bun's optimized handling of web streams and reduced garbage collection overhead."

Cloudflare (2025-10-14)

OpenNext profiling on Workers surfaced the same class of cost:

  • pipeThrough() allocating 50 × 2048-byte Buffer per request.
  • Readable.toWeb(Readable.from(chunks)) double-buffering (replaceable with ReadableStream.from(chunks)).
  • Default highWaterMark: 1 on value-oriented ReadableStreams causing per-byte reads instead of 4096-byte block coalescing.

Cloudflare (2026-02-27)

James Snell's post-mortem: "as one of the core maintainers of Node.js, I am looking forward to helping Malte and the folks at Vercel get their proposed improvements landed!" — frames Node's Web-Streams implementation as an industry-wide performance pain point both vendors are attacking upstream.

Why it's not just a micro-optimisation

Streaming SSR workloads have millions of chunks per hour per instance under load. Per-chunk allocations compound into significant portions of total CPU under sustained traffic. The 12× pipeThrough() vs pipeline() gap is large enough that choosing a runtime with a better Web-Streams implementation (Bun) yields measurable user-facing latency wins even on hot, GC-stable paths.

Mitigations

  1. Runtime choice — Bun's JavaScriptCore + Zig-based I/O implementation avoids the worst of Node's promise-allocation-heavy Web-Streams path (28 % TTLB win per Vercel).
  2. Node upstream improvements — Vercel's proposed fast-webstreams work targets ~10× gains by eliminating per-chunk promises. Library now shipped + upstream PR landing (see sources/2026-04-21-vercel-we-ralph-wiggumed-webstreams-to-make-them-10x-faster): systems/fast-webstreams measures up to 14.6× on the React Flight byte-stream pattern; two ideas upstream in nodejs/node#61807 delivering ~17-20 % faster buffered reads and ~11 % faster pipeTo to every Node.js user.
  3. Adapter-level fixes — OpenNext's PRs to replace Readable.toWeb(Readable.from(chunks)) with ReadableStream.from(chunks); pipeThrough() Buffer allocation fixes shipped upstream.
  4. Alternative streaming APIs — Cloudflare's new-streams POC explores a different API shape that could sidestep Web-Streams' allocation pattern entirely.

Measurement note

This bottleneck is only visible under TTLB, not TTFB — the shell flushes quickly regardless, so TTFB shows a small gap or none at all. See concepts/ttfb-vs-ttlb-ssr-measurement.

Seen in

Last updated · 476 distilled / 1,218 read