Vercel — We Ralph Wiggum'd WebStreams to make them 10x faster¶
Summary¶
Vercel engineering post (2026-04-21) discloses
fast-webstreams, an experimental npm package that
reimplements the WHATWG Web Streams API
(ReadableStream / WritableStream / TransformStream)
on top of Node.js's older stream.Readable /
stream.Writable / stream.Transform internals, yielding
up to 14.6× throughput on specific patterns (React
Flight byte streams), 9.8× on chained pipeThrough, and
3.2× on fetch response bodies threaded through three
transforms. The library was built "mostly with AI,"
test-driven against the 1,116-case Web Platform Tests
streams suite (passing 1,100 vs native Node's 1,099), and
benchmark-driven via a locally-built suite. Two ideas from
the project have already been upstreamed to Node.js by
Matteo Collina via PR
nodejs/node#61807
— "stream: add fast paths for webstreams read and pipeTo"
— delivering ~17-20 % faster buffered reads and ~11 %
faster pipeTo natively. This is the second Vercel-side
contribution in a public conversation with Cloudflare's
James Snell (who committed on X to help land the PR). The
stated long-term goal is for fast-webstreams to stop
existing: "The goal is for WebStreams to be fast enough
that it does not need to."
Key takeaways¶
- The cost is Promise + object allocation, not compute.
Measured native
pipeThroughat 630 MB/s vs Node'sstream.pipeline()at ~7,900 MB/s — a 12.5× gap attributed "almost entirely to Promise and object allocation overhead." Eachreader.read()allocates aReadableStreamDefaultReadRequest, a new Promise, and a{value, done}result object; each resolution adds a microtask hop even when data is already buffered. (Source: this post) - The biggest win is at the React Flight pattern.
React Server Components create
ReadableStream({type:'bytes'})and enqueue externally as the render produces output. Native ~110 MB/s → fast-webstreams ~1,600 MB/s = 14.6×. Mechanism:LiteReadable, a minimal array-based Node Readable replacement using direct callback dispatch instead of EventEmitter, with BYOB + pull-based demand support; "about 5 microseconds less per construction" — material when React Flight creates hundreds of byte streams per request. (Source: this post) - When streams compose, defer resolution to the sink.
Calling
pipeThroughbetween fast streams does not start piping — it records upstream links. WhenpipeTo()is called at the end of the chain, the library walks upstream, collects the underlying Node streams, and issues a singlestream.pipeline()call. One function call, zero Promises per chunk. Result:pipeThroughchained fast-to-fast = ~6,200 MB/s (native 630 MB/s) ≈ 9.8×. Canonicalised as patterns/record-pipe-links-resolve-at-sink. (Source: this post) - Synchronous fast path on
read(). WhennodeReadable.read()has data in-buffer, the library returnsPromise.resolve({value, done})— skipping event-loop round-trip, request-object allocation, and pending-Promise machinery. Only an empty buffer registers a listener. Measured ~12,400 MB/s vs native ~3,300 MB/s = 3.7× on read loops. Canonicalised as concepts/synchronous-fast-path-streaming. (Source: this post) - Fetch bodies are special — you don't construct them.
Response.bodyis a native byte stream owned by Node's HTTP layer; you can't swap it out. The library handles this by patchingResponse.prototype.bodyto wrap the native stream in a fast shell, then using deferred resolution —pipeThroughrecords links; at sink, one Promise at the native-boundary pull, then zero Promises through the transform chain.fetch → 3 transformsnative 260 MB/s → fast 830 MB/s = 3.2×. Plain forwarding: 430 → 850 MB/s = 2.0×. (Source: this post) - The WPT suite is the spec compliance oracle.
fast-webstreamspasses 1,100 of 1,116 Web Platform Tests; native Node passes 1,099. The 16 remaining failures are shared with native (e.g.type: 'owning'transfer mode) or architectural differences that don't affect real apps. WPT-driven development is what made an AI-assisted reimplementation tractable — canonicalised as patterns/ai-reimplementation-against-conformance-suite: "WPT gave us 1,116 tests as an immediate, machine-checkable answer to 'did we break anything?'" (Source: this post) - Upstream fixes land via Node.js PR #61807. After an X
conversation,
Node.js TSC member Matteo Collina
submitted PR #61807
applying two ideas from this project: (a)
read()fast path — when buffered, return a pre-resolved Promise directly (spec-compliant because resolved-vs-pending is observationally equivalent through the microtask queue); (b)pipeTo()batch reads — drain multiple reads from the controller queue without per-chunk request objects, respecting backpressure viadesiredSizecheck after each write. ~17-20 % buffered-read improvement; ~11 %pipeToimprovement. Canonical patterns/upstream-contribution-parallel-to-in-house-integration instance at the streaming-runtime altitude. (Source: this post; GitHub PR) - The spec is smarter than it looks. Verbatim:
"We tried many shortcuts. Almost every one of them
broke a Web Platform Test, and the test was usually
right." Three load-bearing examples: (a) the
ReadableStreamDefaultReadRequest/ Promise-per-read design exists because cancellation during reads, error identity through locked streams, and thenable interception are "real edge cases that real code hits"; (b)Promise.resolve(obj)always checks for.then— WPT tests put thenables on read results to verify correct handling; (c)Reflect.apply, not.call(), to invoke user callbacks — WPT monkey-patchesFunction.prototype.callto verify implementations don't use it. (Source: this post) stream.pipeline()can't universally replacepipeTo. Vercel hoped to route all piping through Node'spipeline(); it caused 72 WPT failures on error propagation, stream-locking, and cancellation semantics.pipeline()is only safe when the full chain is fast-stream — which is why the library collects upstream links and only firespipeline()when the whole chain is homogeneous fast-stream. (Source: this post)- Global patching is the rollout mechanism.
patchGlobalWebStreams()replaces globalReadableStream/WritableStream/TransformStreamconstructors andResponse.prototype.body, sofetch() → pipeThrough() → pipeTo()chains hit the pipeline fast path without code changes. Vercel plans "careful, incremental" fleet rollout prioritising React Server Component streaming, response forwarding, and multi-transform chains. Canonicalised as patterns/global-patch-constructors-for-runtime-optimization. (Source: this post)
Architectural model¶
Three discriminators decide which fast path activates:
| Entry point | Context | Fast path |
|---|---|---|
new ReadableStream({...}) / Writable / Transform |
Fast-stream construction | LiteReadable-backed or stream.Transform-backed |
reader.read() |
Buffered data available | Promise.resolve({value, done}) — no microtask hop |
source.pipeThrough(T).pipeTo(sink) (all fast) |
Homogeneous fast chain | Collect upstream links; single stream.pipeline() call |
source.pipeThrough(T).pipeTo(sink) (any native in chain) |
Heterogeneous chain | Fall back to native pipeThrough or spec-compliant pipeTo |
Response.body.pipeThrough(T).pipeTo(sink) |
fetch response | Fast shell wraps native byte stream; defer until sink; bridge via pipeline |
Benchmark table (1 KB chunks, Node.js v22, MB/s)¶
Operation-level:
| Operation | stream.* |
fast | native Web Streams | fast vs native |
|---|---|---|---|---|
| read loop | 26,400 | 12,400 | 3,300 | 3.7× |
| write loop | 26,500 | 5,500 | 2,300 | 2.4× |
pipeThrough |
7,900 | 6,200 | 630 | 9.8× |
pipeTo |
14,000 | 2,500 | 1,400 | 1.8× |
for-await-of |
— | 4,100 | 3,000 | 1.4× |
Chain-depth compounding:
| Depth | fast | native | ratio |
|---|---|---|---|
| 3 transforms | 2,900 | 300 | 9.7× |
| 8 transforms | 1,000 | 115 | 8.7× |
Pattern-specific:
| Pattern | fast | native | ratio |
|---|---|---|---|
| start + enqueue (React Flight) | 1,600 | 110 | 14.6× |
| byte read loop | 1,400 | 1,400 | 1.0× |
| byte tee | 1,200 | 750 | 1.6× |
Response.text() |
900 | 910 | 1.0× |
| Response forwarding | 850 | 430 | 2.0× |
| fetch → 3 transforms | 830 | 260 | 3.2× |
Construction cost:
| Type | fast | native | ratio |
|---|---|---|---|
ReadableStream |
2,100 | 980 | 2.1× |
WritableStream |
1,300 | 440 | 3.0× |
TransformStream |
470 | 220 | 2.1× |
Upstream (Node.js PR #61807, applies to all Node users):
- Buffered
read(): +17-20 % pipeTo(buffered): +11 %
Systems / concepts / patterns extracted¶
New systems created by this ingest:
- systems/fast-webstreams — the library itself.
- systems/lite-readable — minimal array-based Node Readable replacement backing the byte-stream path.
- systems/react-flight — the React Server Components byte-stream pattern driving the largest measured gap.
- systems/wpt-web-platform-tests — the shared conformance suite across browser runtimes, used as executable spec here.
Existing systems extended:
- systems/nodejs — PR #61807 landed ideas from this
project;
stream.pipeline()framed as the "good" Node streaming path whose perf the fast library routes to. - systems/web-streams-api — third-vendor measurement instance (Cloudflare, Vercel, now Vercel-again) pinning per-chunk Promise allocation as the structural cost.
- systems/nextjs — the named production workload where the largest gaps land (React Flight + multi-transform chains).
- systems/vercel-functions — the platform that will roll this out in production.
- systems/bun — sibling implementation whose native streams already avoid this class of cost; now has a Node-side counter.
- systems/new-streams — Snell's alternative API sibling project; same problem, different solution axis (new API vs same API + fast implementation).
New concepts created:
- concepts/synchronous-fast-path-streaming — the return-resolved-promise-if-buffered optimisation.
- concepts/spec-compliant-optimization — the design discipline of removing allocations within spec observability constraints.
- concepts/microtask-hop-cost — the
Promise.resolve()→ microtask-queue → callback cost per read even when data is already available.
Existing concepts extended:
- concepts/promise-allocation-overhead — third-vendor
measurement instance; adds concrete 12.5× gap + the
ReadableStreamDefaultReadRequestallocation as the per-read canonical cost. - concepts/web-streams-as-ssr-bottleneck — first disclosure of a library-level fix with measured throughput; Vercel-side extension of the 28 %-TTLB headline from the 2026-04-21 Bun post.
- concepts/stream-adapter-overhead — inverted here: Node-stream-backed Web-stream implementation adapter runs faster than native Web-stream implementation because Node's streams are C++-backed and the adapter is cheaper than the original.
- concepts/streaming-ssr — library-level mitigation axis added alongside runtime-choice axis.
New patterns created:
- patterns/ai-reimplementation-against-conformance-suite — AI-driven reimplementation made tractable by a pre-existing comprehensive test suite + benchmark suite as paired oracles.
- patterns/record-pipe-links-resolve-at-sink — defer pipe resolution until sink is known, collect upstream links, issue single resolved call.
- patterns/global-patch-constructors-for-runtime-optimization
—
patchGlobalWebStreams()model: replace global constructors + special-case prototype properties (Response.prototype.body) to transparently opt the whole process into faster implementations.
Existing patterns extended:
- patterns/upstream-contribution-parallel-to-in-house-integration — Matteo Collina / Vercel / Node.js PR #61807 as the streaming-runtime altitude instance.
- patterns/tests-as-executable-specifications — WPT as the canonical cross-runtime executable spec enabling AI-driven reimplementation.
- patterns/clean-reimplementation-over-adapter — sibling altitude; this post is an adapter pattern (fast-webstreams adapts Node streams to Web Streams API surface), but built on the same tests-as-spec discipline.
Operational caveats¶
- Experimental label is deliberate. Package name is
experimental-fast-webstreams. Vercel is confident in correctness but "this is an area of active development." - 16 WPT failures remain. Noted as "shared with
native" or "architectural differences that don't
affect real applications" — e.g.
type: 'owning'transfer mode not implemented. pipeline()is restricted. Only used when the entire chain is fast-stream; mixing in a nativeCompressionStreamfalls back to nativepipeThroughor spec-compliantpipeTo. This is why the library collects upstream links rather than aggressively fusing.- Fetch-body patching is opt-in via global patch.
Without
patchGlobalWebStreams(),fetch() ⇒ pipeThroughchains stay on the slow native path. - No production fleet data yet. "At Vercel, we are looking at rolling this out across our fleet. We will do so carefully and incrementally." Measurement disclosure is still-in-benchmark-mode, not rolled out.
Promise.resolve(obj)always checks for thenables — a language-level invariant the library had to carefully navigate around when allocating fewer{value, done}objects in hot paths.Reflect.applymandatory for user callbacks — WPT monkey-patchesFunction.prototype.callto ensure implementations don't use it;Reflect.applyis the only safe alternative.- No cross-runtime numbers. All benchmarks are Node.js v22. Bun / Deno / Workers are not in the table.
- The 1 KB chunk size matters. Throughput at larger chunks (say 64 KB) narrows the per-chunk overhead share of total cost; Vercel does not quantify larger-chunk numbers.
- AI-assisted development disclosure is light. Vercel says "built most of fast-webstreams with AI" but doesn't name the model, workflow, or split between human + AI edits. The WPT + benchmark feedback loop is the load-bearing claim, not the AI.
Source¶
- Original: https://vercel.com/blog/we-ralph-wiggumed-webstreams-to-make-them-10x-faster
- Raw markdown:
raw/vercel/2026-04-21-we-ralph-wiggumed-webstreams-to-make-them-10x-faster-39637056.md - npm package:
experimental-fast-webstreams - Node.js upstream PR: nodejs/node#61807 — "stream: add fast paths for webstreams read and pipeTo"
- Node.js performance tracking issue: nodejs/performance#134 — James Snell's enumeration of remaining Web-Streams optimisation opportunities (C++ piping for internally-sourced streams, lazy buffering, WritableStream double-buffer elimination)
Related¶
- sources/2026-04-21-vercel-bun-runtime-on-vercel-functions — sibling Vercel-side disclosure. Names Node's Web-Streams as the bottleneck at profiling level; this post delivers the library-level fix with measured numbers.
- sources/2026-02-27-cloudflare-a-better-streams-api-is-possible-for-javascript
— Cloudflare-side sibling. Snell's structural
critique at the API-design level; this post is the
implementation-level response within the current API
surface. The 12.5×
pipeThroughnumber cited in the Cloudflare post originates here, making this the primary source. - sources/2025-10-14-cloudflare-unpacking-cloudflare-workers-cpu-performance-benchmarks — Cloudflare OpenNext profiling that flagged the same class of cost at the adapter layer. Cross-vendor convergence on the same diagnosis.
- companies/vercel — the operator publishing this engineering retrospective on their platform's perf roadmap.
- systems/fast-webstreams — the library under test.
- systems/lite-readable — the custom Node Readable replacement.
- systems/react-flight — the workload with the 14.6× gap.
- systems/wpt-web-platform-tests — the conformance suite that made AI-assisted reimplementation feasible.
- concepts/promise-allocation-overhead — the root cost class this post's fix targets.
- concepts/web-streams-as-ssr-bottleneck — the parent bottleneck naming; now has a library-level mitigation.
- concepts/synchronous-fast-path-streaming — the
read()-returns-resolved-promise-if-buffered pattern. - concepts/spec-compliant-optimization — the discipline of observability-preserving allocation removal.
- concepts/microtask-hop-cost — per-read cost even when data is already available.
- patterns/record-pipe-links-resolve-at-sink — the zero-Promise pipe-through mechanism.
- patterns/global-patch-constructors-for-runtime-optimization
—
patchGlobalWebStreams()rollout model. - patterns/ai-reimplementation-against-conformance-suite — the engineering method.
- patterns/upstream-contribution-parallel-to-in-house-integration — the two-path delivery model (library + Node PR).
- patterns/tests-as-executable-specifications — WPT as the canonical executable-spec corpus.