Skip to content

VERCEL 2026-04-21 Tier 3

Read original ↗

Vercel — We Ralph Wiggum'd WebStreams to make them 10x faster

Summary

Vercel engineering post (2026-04-21) discloses fast-webstreams, an experimental npm package that reimplements the WHATWG Web Streams API (ReadableStream / WritableStream / TransformStream) on top of Node.js's older stream.Readable / stream.Writable / stream.Transform internals, yielding up to 14.6× throughput on specific patterns (React Flight byte streams), 9.8× on chained pipeThrough, and 3.2× on fetch response bodies threaded through three transforms. The library was built "mostly with AI," test-driven against the 1,116-case Web Platform Tests streams suite (passing 1,100 vs native Node's 1,099), and benchmark-driven via a locally-built suite. Two ideas from the project have already been upstreamed to Node.js by Matteo Collina via PR nodejs/node#61807"stream: add fast paths for webstreams read and pipeTo" — delivering ~17-20 % faster buffered reads and ~11 % faster pipeTo natively. This is the second Vercel-side contribution in a public conversation with Cloudflare's James Snell (who committed on X to help land the PR). The stated long-term goal is for fast-webstreams to stop existing: "The goal is for WebStreams to be fast enough that it does not need to."

Key takeaways

  1. The cost is Promise + object allocation, not compute. Measured native pipeThrough at 630 MB/s vs Node's stream.pipeline() at ~7,900 MB/s — a 12.5× gap attributed "almost entirely to Promise and object allocation overhead." Each reader.read() allocates a ReadableStreamDefaultReadRequest, a new Promise, and a {value, done} result object; each resolution adds a microtask hop even when data is already buffered. (Source: this post)
  2. The biggest win is at the React Flight pattern. React Server Components create ReadableStream({type:'bytes'}) and enqueue externally as the render produces output. Native ~110 MB/s → fast-webstreams ~1,600 MB/s = 14.6×. Mechanism: LiteReadable, a minimal array-based Node Readable replacement using direct callback dispatch instead of EventEmitter, with BYOB + pull-based demand support; "about 5 microseconds less per construction" — material when React Flight creates hundreds of byte streams per request. (Source: this post)
  3. When streams compose, defer resolution to the sink. Calling pipeThrough between fast streams does not start piping — it records upstream links. When pipeTo() is called at the end of the chain, the library walks upstream, collects the underlying Node streams, and issues a single stream.pipeline() call. One function call, zero Promises per chunk. Result: pipeThrough chained fast-to-fast = ~6,200 MB/s (native 630 MB/s) ≈ 9.8×. Canonicalised as patterns/record-pipe-links-resolve-at-sink. (Source: this post)
  4. Synchronous fast path on read(). When nodeReadable.read() has data in-buffer, the library returns Promise.resolve({value, done}) — skipping event-loop round-trip, request-object allocation, and pending-Promise machinery. Only an empty buffer registers a listener. Measured ~12,400 MB/s vs native ~3,300 MB/s = 3.7× on read loops. Canonicalised as concepts/synchronous-fast-path-streaming. (Source: this post)
  5. Fetch bodies are special — you don't construct them. Response.body is a native byte stream owned by Node's HTTP layer; you can't swap it out. The library handles this by patching Response.prototype.body to wrap the native stream in a fast shell, then using deferred resolutionpipeThrough records links; at sink, one Promise at the native-boundary pull, then zero Promises through the transform chain. fetch → 3 transforms native 260 MB/s → fast 830 MB/s = 3.2×. Plain forwarding: 430 → 850 MB/s = 2.0×. (Source: this post)
  6. The WPT suite is the spec compliance oracle. fast-webstreams passes 1,100 of 1,116 Web Platform Tests; native Node passes 1,099. The 16 remaining failures are shared with native (e.g. type: 'owning' transfer mode) or architectural differences that don't affect real apps. WPT-driven development is what made an AI-assisted reimplementation tractable — canonicalised as patterns/ai-reimplementation-against-conformance-suite: "WPT gave us 1,116 tests as an immediate, machine-checkable answer to 'did we break anything?'" (Source: this post)
  7. Upstream fixes land via Node.js PR #61807. After an X conversation, Node.js TSC member Matteo Collina submitted PR #61807 applying two ideas from this project: (a) read() fast path — when buffered, return a pre-resolved Promise directly (spec-compliant because resolved-vs-pending is observationally equivalent through the microtask queue); (b) pipeTo() batch reads — drain multiple reads from the controller queue without per-chunk request objects, respecting backpressure via desiredSize check after each write. ~17-20 % buffered-read improvement; ~11 % pipeTo improvement. Canonical patterns/upstream-contribution-parallel-to-in-house-integration instance at the streaming-runtime altitude. (Source: this post; GitHub PR)
  8. The spec is smarter than it looks. Verbatim: "We tried many shortcuts. Almost every one of them broke a Web Platform Test, and the test was usually right." Three load-bearing examples: (a) the ReadableStreamDefaultReadRequest / Promise-per-read design exists because cancellation during reads, error identity through locked streams, and thenable interception are "real edge cases that real code hits"; (b) Promise.resolve(obj) always checks for .then — WPT tests put thenables on read results to verify correct handling; (c) Reflect.apply, not .call(), to invoke user callbacks — WPT monkey-patches Function.prototype.call to verify implementations don't use it. (Source: this post)
  9. stream.pipeline() can't universally replace pipeTo. Vercel hoped to route all piping through Node's pipeline(); it caused 72 WPT failures on error propagation, stream-locking, and cancellation semantics. pipeline() is only safe when the full chain is fast-stream — which is why the library collects upstream links and only fires pipeline() when the whole chain is homogeneous fast-stream. (Source: this post)
  10. Global patching is the rollout mechanism. patchGlobalWebStreams() replaces global ReadableStream / WritableStream / TransformStream constructors and Response.prototype.body, so fetch() → pipeThrough() → pipeTo() chains hit the pipeline fast path without code changes. Vercel plans "careful, incremental" fleet rollout prioritising React Server Component streaming, response forwarding, and multi-transform chains. Canonicalised as patterns/global-patch-constructors-for-runtime-optimization. (Source: this post)

Architectural model

Three discriminators decide which fast path activates:

Entry point Context Fast path
new ReadableStream({...}) / Writable / Transform Fast-stream construction LiteReadable-backed or stream.Transform-backed
reader.read() Buffered data available Promise.resolve({value, done}) — no microtask hop
source.pipeThrough(T).pipeTo(sink) (all fast) Homogeneous fast chain Collect upstream links; single stream.pipeline() call
source.pipeThrough(T).pipeTo(sink) (any native in chain) Heterogeneous chain Fall back to native pipeThrough or spec-compliant pipeTo
Response.body.pipeThrough(T).pipeTo(sink) fetch response Fast shell wraps native byte stream; defer until sink; bridge via pipeline

Benchmark table (1 KB chunks, Node.js v22, MB/s)

Operation-level:

Operation stream.* fast native Web Streams fast vs native
read loop 26,400 12,400 3,300 3.7×
write loop 26,500 5,500 2,300 2.4×
pipeThrough 7,900 6,200 630 9.8×
pipeTo 14,000 2,500 1,400 1.8×
for-await-of 4,100 3,000 1.4×

Chain-depth compounding:

Depth fast native ratio
3 transforms 2,900 300 9.7×
8 transforms 1,000 115 8.7×

Pattern-specific:

Pattern fast native ratio
start + enqueue (React Flight) 1,600 110 14.6×
byte read loop 1,400 1,400 1.0×
byte tee 1,200 750 1.6×
Response.text() 900 910 1.0×
Response forwarding 850 430 2.0×
fetch → 3 transforms 830 260 3.2×

Construction cost:

Type fast native ratio
ReadableStream 2,100 980 2.1×
WritableStream 1,300 440 3.0×
TransformStream 470 220 2.1×

Upstream (Node.js PR #61807, applies to all Node users):

  • Buffered read(): +17-20 %
  • pipeTo (buffered): +11 %

Systems / concepts / patterns extracted

New systems created by this ingest:

Existing systems extended:

  • systems/nodejs — PR #61807 landed ideas from this project; stream.pipeline() framed as the "good" Node streaming path whose perf the fast library routes to.
  • systems/web-streams-api — third-vendor measurement instance (Cloudflare, Vercel, now Vercel-again) pinning per-chunk Promise allocation as the structural cost.
  • systems/nextjs — the named production workload where the largest gaps land (React Flight + multi-transform chains).
  • systems/vercel-functions — the platform that will roll this out in production.
  • systems/bun — sibling implementation whose native streams already avoid this class of cost; now has a Node-side counter.
  • systems/new-streams — Snell's alternative API sibling project; same problem, different solution axis (new API vs same API + fast implementation).

New concepts created:

Existing concepts extended:

  • concepts/promise-allocation-overhead — third-vendor measurement instance; adds concrete 12.5× gap + the ReadableStreamDefaultReadRequest allocation as the per-read canonical cost.
  • concepts/web-streams-as-ssr-bottleneck — first disclosure of a library-level fix with measured throughput; Vercel-side extension of the 28 %-TTLB headline from the 2026-04-21 Bun post.
  • concepts/stream-adapter-overhead — inverted here: Node-stream-backed Web-stream implementation adapter runs faster than native Web-stream implementation because Node's streams are C++-backed and the adapter is cheaper than the original.
  • concepts/streaming-ssr — library-level mitigation axis added alongside runtime-choice axis.

New patterns created:

Existing patterns extended:

Operational caveats

  • Experimental label is deliberate. Package name is experimental-fast-webstreams. Vercel is confident in correctness but "this is an area of active development."
  • 16 WPT failures remain. Noted as "shared with native" or "architectural differences that don't affect real applications" — e.g. type: 'owning' transfer mode not implemented.
  • pipeline() is restricted. Only used when the entire chain is fast-stream; mixing in a native CompressionStream falls back to native pipeThrough or spec-compliant pipeTo. This is why the library collects upstream links rather than aggressively fusing.
  • Fetch-body patching is opt-in via global patch. Without patchGlobalWebStreams(), fetch() ⇒ pipeThrough chains stay on the slow native path.
  • No production fleet data yet. "At Vercel, we are looking at rolling this out across our fleet. We will do so carefully and incrementally." Measurement disclosure is still-in-benchmark-mode, not rolled out.
  • Promise.resolve(obj) always checks for thenables — a language-level invariant the library had to carefully navigate around when allocating fewer {value, done} objects in hot paths.
  • Reflect.apply mandatory for user callbacks — WPT monkey-patches Function.prototype.call to ensure implementations don't use it; Reflect.apply is the only safe alternative.
  • No cross-runtime numbers. All benchmarks are Node.js v22. Bun / Deno / Workers are not in the table.
  • The 1 KB chunk size matters. Throughput at larger chunks (say 64 KB) narrows the per-chunk overhead share of total cost; Vercel does not quantify larger-chunk numbers.
  • AI-assisted development disclosure is light. Vercel says "built most of fast-webstreams with AI" but doesn't name the model, workflow, or split between human + AI edits. The WPT + benchmark feedback loop is the load-bearing claim, not the AI.

Source

Last updated · 476 distilled / 1,218 read