VERCEL 2026-04-21 Tier 3

Vercel — We Ralph Wiggum'd WebStreams to make them 10x faster¶

Summary¶

Vercel engineering post (2026-04-21) discloses fast-webstreams, an experimental npm package that reimplements the WHATWG Web Streams API (ReadableStream / WritableStream / TransformStream) on top of Node.js's older stream.Readable / stream.Writable / stream.Transform internals, yielding up to 14.6× throughput on specific patterns (React Flight byte streams), 9.8× on chained pipeThrough, and 3.2× on fetch response bodies threaded through three transforms. The library was built "mostly with AI," test-driven against the 1,116-case Web Platform Tests streams suite (passing 1,100 vs native Node's 1,099), and benchmark-driven via a locally-built suite. Two ideas from the project have already been upstreamed to Node.js by Matteo Collina via PR nodejs/node#61807 — "stream: add fast paths for webstreams read and pipeTo" — delivering ~17-20 % faster buffered reads and ~11 % faster pipeTo natively. This is the second Vercel-side contribution in a public conversation with Cloudflare's James Snell (who committed on X to help land the PR). The stated long-term goal is for fast-webstreams to stop existing: "The goal is for WebStreams to be fast enough that it does not need to."

Key takeaways¶

The cost is Promise + object allocation, not compute. Measured native pipeThrough at 630 MB/s vs Node's stream.pipeline() at ~7,900 MB/s — a 12.5× gap attributed "almost entirely to Promise and object allocation overhead." Each reader.read() allocates a ReadableStreamDefaultReadRequest, a new Promise, and a {value, done} result object; each resolution adds a microtask hop even when data is already buffered. (Source: this post)
The biggest win is at the React Flight pattern. React Server Components create ReadableStream({type:'bytes'}) and enqueue externally as the render produces output. Native ~110 MB/s → fast-webstreams ~1,600 MB/s = 14.6×. Mechanism: LiteReadable, a minimal array-based Node Readable replacement using direct callback dispatch instead of EventEmitter, with BYOB + pull-based demand support; "about 5 microseconds less per construction" — material when React Flight creates hundreds of byte streams per request. (Source: this post)
When streams compose, defer resolution to the sink. Calling pipeThrough between fast streams does not start piping — it records upstream links. When pipeTo() is called at the end of the chain, the library walks upstream, collects the underlying Node streams, and issues a single stream.pipeline() call. One function call, zero Promises per chunk. Result: pipeThrough chained fast-to-fast = ~6,200 MB/s (native 630 MB/s) ≈ 9.8×. Canonicalised as patterns/record-pipe-links-resolve-at-sink. (Source: this post)
Synchronous fast path on read(). When nodeReadable.read() has data in-buffer, the library returns Promise.resolve({value, done}) — skipping event-loop round-trip, request-object allocation, and pending-Promise machinery. Only an empty buffer registers a listener. Measured ~12,400 MB/s vs native ~3,300 MB/s = 3.7× on read loops. Canonicalised as concepts/synchronous-fast-path-streaming. (Source: this post)
Fetch bodies are special — you don't construct them. Response.body is a native byte stream owned by Node's HTTP layer; you can't swap it out. The library handles this by patching Response.prototype.body to wrap the native stream in a fast shell, then using deferred resolution — pipeThrough records links; at sink, one Promise at the native-boundary pull, then zero Promises through the transform chain. fetch → 3 transforms native 260 MB/s → fast 830 MB/s = 3.2×. Plain forwarding: 430 → 850 MB/s = 2.0×. (Source: this post)
The WPT suite is the spec compliance oracle. fast-webstreams passes 1,100 of 1,116 Web Platform Tests; native Node passes 1,099. The 16 remaining failures are shared with native (e.g. type: 'owning' transfer mode) or architectural differences that don't affect real apps. WPT-driven development is what made an AI-assisted reimplementation tractable — canonicalised as patterns/ai-reimplementation-against-conformance-suite: "WPT gave us 1,116 tests as an immediate, machine-checkable answer to 'did we break anything?'" (Source: this post)
Upstream fixes land via Node.js PR #61807. After an X conversation, Node.js TSC member Matteo Collina submitted PR #61807 applying two ideas from this project: (a) read() fast path — when buffered, return a pre-resolved Promise directly (spec-compliant because resolved-vs-pending is observationally equivalent through the microtask queue); (b) pipeTo() batch reads — drain multiple reads from the controller queue without per-chunk request objects, respecting backpressure via desiredSize check after each write. ~17-20 % buffered-read improvement; ~11 % pipeTo improvement. Canonical patterns/upstream-contribution-parallel-to-in-house-integration instance at the streaming-runtime altitude. (Source: this post; GitHub PR)
The spec is smarter than it looks. Verbatim: "We tried many shortcuts. Almost every one of them broke a Web Platform Test, and the test was usually right." Three load-bearing examples: (a) the ReadableStreamDefaultReadRequest / Promise-per-read design exists because cancellation during reads, error identity through locked streams, and thenable interception are "real edge cases that real code hits"; (b) Promise.resolve(obj) always checks for .then — WPT tests put thenables on read results to verify correct handling; (c) Reflect.apply, not .call(), to invoke user callbacks — WPT monkey-patches Function.prototype.call to verify implementations don't use it. (Source: this post)
stream.pipeline() can't universally replace pipeTo. Vercel hoped to route all piping through Node's pipeline(); it caused 72 WPT failures on error propagation, stream-locking, and cancellation semantics. pipeline() is only safe when the full chain is fast-stream — which is why the library collects upstream links and only fires pipeline() when the whole chain is homogeneous fast-stream. (Source: this post)
Global patching is the rollout mechanism. patchGlobalWebStreams() replaces global ReadableStream / WritableStream / TransformStream constructors and Response.prototype.body, so fetch() → pipeThrough() → pipeTo() chains hit the pipeline fast path without code changes. Vercel plans "careful, incremental" fleet rollout prioritising React Server Component streaming, response forwarding, and multi-transform chains. Canonicalised as patterns/global-patch-constructors-for-runtime-optimization. (Source: this post)

Architectural model¶

Three discriminators decide which fast path activates:

Entry point	Context	Fast path
`new ReadableStream({...})` / `Writable` / `Transform`	Fast-stream construction	LiteReadable-backed or stream.Transform-backed
`reader.read()`	Buffered data available	`Promise.resolve({value, done})` — no microtask hop
`source.pipeThrough(T).pipeTo(sink)` (all fast)	Homogeneous fast chain	Collect upstream links; single `stream.pipeline()` call
`source.pipeThrough(T).pipeTo(sink)` (any native in chain)	Heterogeneous chain	Fall back to native `pipeThrough` or spec-compliant `pipeTo`
`Response.body.pipeThrough(T).pipeTo(sink)`	fetch response	Fast shell wraps native byte stream; defer until sink; bridge via pipeline

Benchmark table (1 KB chunks, Node.js v22, MB/s)¶

Operation-level:

Operation	`stream.*`	fast	native Web Streams	fast vs native
read loop	26,400	12,400	3,300	3.7×
write loop	26,500	5,500	2,300	2.4×
`pipeThrough`	7,900	6,200	630	9.8×
`pipeTo`	14,000	2,500	1,400	1.8×
`for-await-of`	—	4,100	3,000	1.4×

Chain-depth compounding:

Depth	fast	native	ratio
3 transforms	2,900	300	9.7×
8 transforms	1,000	115	8.7×

Pattern-specific:

Pattern	fast	native	ratio
start + enqueue (React Flight)	1,600	110	14.6×
byte read loop	1,400	1,400	1.0×
byte tee	1,200	750	1.6×
`Response.text()`	900	910	1.0×
Response forwarding	850	430	2.0×
fetch → 3 transforms	830	260	3.2×

Construction cost:

Type	fast	native	ratio
`ReadableStream`	2,100	980	2.1×
`WritableStream`	1,300	440	3.0×
`TransformStream`	470	220	2.1×

Upstream (Node.js PR #61807, applies to all Node users):

Buffered read(): +17-20 %
pipeTo (buffered): +11 %

Systems / concepts / patterns extracted¶

New systems created by this ingest:

systems/fast-webstreams — the library itself.
systems/lite-readable — minimal array-based Node Readable replacement backing the byte-stream path.
systems/react-flight — the React Server Components byte-stream pattern driving the largest measured gap.
systems/wpt-web-platform-tests — the shared conformance suite across browser runtimes, used as executable spec here.

Existing systems extended:

systems/nodejs — PR #61807 landed ideas from this project; stream.pipeline() framed as the "good" Node streaming path whose perf the fast library routes to.
systems/web-streams-api — third-vendor measurement instance (Cloudflare, Vercel, now Vercel-again) pinning per-chunk Promise allocation as the structural cost.
systems/nextjs — the named production workload where the largest gaps land (React Flight + multi-transform chains).
systems/vercel-functions — the platform that will roll this out in production.
systems/bun — sibling implementation whose native streams already avoid this class of cost; now has a Node-side counter.
systems/new-streams — Snell's alternative API sibling project; same problem, different solution axis (new API vs same API + fast implementation).

New concepts created:

concepts/synchronous-fast-path-streaming — the return-resolved-promise-if-buffered optimisation.
concepts/spec-compliant-optimization — the design discipline of removing allocations within spec observability constraints.
concepts/microtask-hop-cost — the Promise.resolve() → microtask-queue → callback cost per read even when data is already available.

Existing concepts extended:

concepts/promise-allocation-overhead — third-vendor measurement instance; adds concrete 12.5× gap + the ReadableStreamDefaultReadRequest allocation as the per-read canonical cost.
concepts/web-streams-as-ssr-bottleneck — first disclosure of a library-level fix with measured throughput; Vercel-side extension of the 28 %-TTLB headline from the 2026-04-21 Bun post.
concepts/stream-adapter-overhead — inverted here: Node-stream-backed Web-stream implementation adapter runs faster than native Web-stream implementation because Node's streams are C++-backed and the adapter is cheaper than the original.
concepts/streaming-ssr — library-level mitigation axis added alongside runtime-choice axis.

New patterns created:

patterns/ai-reimplementation-against-conformance-suite — AI-driven reimplementation made tractable by a pre-existing comprehensive test suite + benchmark suite as paired oracles.
patterns/record-pipe-links-resolve-at-sink — defer pipe resolution until sink is known, collect upstream links, issue single resolved call.
patterns/global-patch-constructors-for-runtime-optimization — patchGlobalWebStreams() model: replace global constructors + special-case prototype properties (Response.prototype.body) to transparently opt the whole process into faster implementations.

Existing patterns extended:

patterns/upstream-contribution-parallel-to-in-house-integration — Matteo Collina / Vercel / Node.js PR #61807 as the streaming-runtime altitude instance.
patterns/tests-as-executable-specifications — WPT as the canonical cross-runtime executable spec enabling AI-driven reimplementation.
patterns/clean-reimplementation-over-adapter — sibling altitude; this post is an adapter pattern (fast-webstreams adapts Node streams to Web Streams API surface), but built on the same tests-as-spec discipline.

Operational caveats¶

Experimental label is deliberate. Package name is experimental-fast-webstreams. Vercel is confident in correctness but "this is an area of active development."
16 WPT failures remain. Noted as "shared with native" or "architectural differences that don't affect real applications" — e.g. type: 'owning' transfer mode not implemented.
pipeline() is restricted. Only used when the entire chain is fast-stream; mixing in a native CompressionStream falls back to native pipeThrough or spec-compliant pipeTo. This is why the library collects upstream links rather than aggressively fusing.
Fetch-body patching is opt-in via global patch. Without patchGlobalWebStreams(), fetch() ⇒ pipeThrough chains stay on the slow native path.
No production fleet data yet. "At Vercel, we are looking at rolling this out across our fleet. We will do so carefully and incrementally." Measurement disclosure is still-in-benchmark-mode, not rolled out.
Promise.resolve(obj) always checks for thenables — a language-level invariant the library had to carefully navigate around when allocating fewer {value, done} objects in hot paths.
Reflect.apply mandatory for user callbacks — WPT monkey-patches Function.prototype.call to ensure implementations don't use it; Reflect.apply is the only safe alternative.
No cross-runtime numbers. All benchmarks are Node.js v22. Bun / Deno / Workers are not in the table.
The 1 KB chunk size matters. Throughput at larger chunks (say 64 KB) narrows the per-chunk overhead share of total cost; Vercel does not quantify larger-chunk numbers.
AI-assisted development disclosure is light. Vercel says "built most of fast-webstreams with AI" but doesn't name the model, workflow, or split between human + AI edits. The WPT + benchmark feedback loop is the load-bearing claim, not the AI.

Source¶

Original: https://vercel.com/blog/we-ralph-wiggumed-webstreams-to-make-them-10x-faster
Raw markdown: raw/vercel/2026-04-21-we-ralph-wiggumed-webstreams-to-make-them-10x-faster-39637056.md
npm package: experimental-fast-webstreams
Node.js upstream PR: nodejs/node#61807 — "stream: add fast paths for webstreams read and pipeTo"
Node.js performance tracking issue: nodejs/performance#134 — James Snell's enumeration of remaining Web-Streams optimisation opportunities (C++ piping for internally-sourced streams, lazy buffering, WritableStream double-buffer elimination)

sources/2026-04-21-vercel-bun-runtime-on-vercel-functions — sibling Vercel-side disclosure. Names Node's Web-Streams as the bottleneck at profiling level; this post delivers the library-level fix with measured numbers.
sources/2026-02-27-cloudflare-a-better-streams-api-is-possible-for-javascript — Cloudflare-side sibling. Snell's structural critique at the API-design level; this post is the implementation-level response within the current API surface. The 12.5× pipeThrough number cited in the Cloudflare post originates here, making this the primary source.
sources/2025-10-14-cloudflare-unpacking-cloudflare-workers-cpu-performance-benchmarks — Cloudflare OpenNext profiling that flagged the same class of cost at the adapter layer. Cross-vendor convergence on the same diagnosis.
companies/vercel — the operator publishing this engineering retrospective on their platform's perf roadmap.
systems/fast-webstreams — the library under test.
systems/lite-readable — the custom Node Readable replacement.
systems/react-flight — the workload with the 14.6× gap.
systems/wpt-web-platform-tests — the conformance suite that made AI-assisted reimplementation feasible.
concepts/promise-allocation-overhead — the root cost class this post's fix targets.
concepts/web-streams-as-ssr-bottleneck — the parent bottleneck naming; now has a library-level mitigation.
concepts/synchronous-fast-path-streaming — the read()-returns-resolved-promise-if-buffered pattern.
concepts/spec-compliant-optimization — the discipline of observability-preserving allocation removal.
concepts/microtask-hop-cost — per-read cost even when data is already available.
patterns/record-pipe-links-resolve-at-sink — the zero-Promise pipe-through mechanism.
patterns/global-patch-constructors-for-runtime-optimization — patchGlobalWebStreams() rollout model.
patterns/ai-reimplementation-against-conformance-suite — the engineering method.
patterns/upstream-contribution-parallel-to-in-house-integration — the two-path delivery model (library + Node PR).
patterns/tests-as-executable-specifications — WPT as the canonical executable-spec corpus.