Skip to content

CLOUDFLARE 2025-10-14 Tier 1

Read original ↗

Cloudflare — Unpacking Cloudflare Workers CPU Performance Benchmarks

Summary

Public-response post to Theo Browne's 2025-10-04 cf-vs-vercel-bench benchmark suite, which showed Cloudflare Workers running CPU-heavy JavaScript up to 3.5× slower than Node.js on Vercel despite both runtimes sharing V8. Cloudflare dissects the disparity across four layers — Workers runtime tuning, OpenNext adapter code, Node.js standard- library gaps, and the benchmark methodology itself — shipping fixes in each and closing the gap to near-parity on every case except Next.js (which still shows a gap and has a stated plan). Operational claims are grounded in specific knob names, ship dates, and upstream PR links.

Key takeaways

  • Workers' warm-isolate routing heuristic was tuned for latency/throughput across I/O-bound traffic, not CPU-bound. When a burst of expensive requests hit one isolate, later requests queued behind the long-running one. The benchmark "was not really measuring CPU time" — it was measuring isolate-queue wait, which is not billed as CPU time against the waiting request. Fix: updated the algorithm to detect sustained CPU-heavy work earlier and bias traffic so new isolates spin up faster for CPU-bound workloads while I/O- bound workloads still coalesce onto warm isolates. Rolled out globally.
  • The V8 young-generation size in Workers was hand-tuned to 2017-era V8 guidance for 512 MB environments (Workers defaults to 128 MB per isolate). V8's GC has changed dramatically since 2017; the manual cap was making GC work harder and more frequently than needed. Fix: removed the manual tuning, let V8 pick young-space size via its internal heuristics. ~25 % boost on the benchmark, small memory-usage increase; all Workers benefit, not just the benchmarked one. (Source: sources/2025-10-14-cloudflare-unpacking-cloudflare-workers-cpu-performance-benchmarks)
  • OpenNext / Next.js / React had ~10–25 % of request time in GC under profiling. Root causes in the adapter layer: pipeThrough() creating 50 unused 2048-byte Buffer instances per render, [opennextjs-cloudflare](<../systems/opennext.md>) needlessly copying every streamed output chunk on the way out, Buffer.concat(chunks) called just to read .length (discarding the concat). Cloudflare submitted upstream PRs to OpenNext and planned further patches to Next.js and React.
  • Streams adapters were double-buffering. Node.js Readable.toWeb(Readable.from(res.getBody())) runs the same data through a Node.js stream's internal buffer AND a Web Streams internal buffer; replacing with ReadableStream.from(chunks) eliminates one copy and one buffering layer. Additional finding: many ReadableStreams in React/Next.js were created as value-oriented (highWaterMark default = 1), forcing one read per enqueued chunk even on byte data — switching to type: 'bytes' + highWaterMark: 4096 enables coalescing.
  • JSON.parse(text, reviver) got slower in 2024 when the TC39 proposal-json-parse-with-source added a third argument to the reviver callback providing JSON source context. React and Next.js use revivers heavily (>100,000 reviver calls per Next.js request on the benchmark). Cloudflare, which employs V8 core contributors, upstreamed a V8 patch giving ~33 % speedup on JSON.parse with reviver (Chromium CL 7027411); ships in V8 14.3 / Chrome 143 — benefits all V8 embedders (Node.js, Chrome, Deno), canonical upstream-the-fix shape.
  • Node.js has a slower trig-functions path than Workers because the V8 V8_USE_LIBM_TRIG_FUNCTIONS compile-time flag (which selects the faster trig impl) is on by default in Workers but off in Node.js. Cloudflare opened nodejs PR #60153 to enable it in Node.js; benefits Lambda / Vercel when picked up, no benefit to Cloudflare customers — they fixed it anyway because "a bug is a bug and we like making everything faster".
  • The benchmark's methodology contained several disparity-amplifying artifacts that had nothing to do with CPU speed (see concepts/benchmark-methodology-bias):
  • TTFB vs TTLB skew. The benchmark measured time-to-first-byte. In dynamic rendering mode (Vercel's config), TTFB returns before full-page render — streaming hides rendering time. In static mode (Cloudflare's config), OpenNext buffers the entire ~2–15 MB response before sending any bytes. Once Cloudflare switched to dynamic (matching Vercel) the TTFB comparison became fair — but neither version then measures full-render cost.
  • NODE_ENV unset in the React SSR benchmark (low-level React API, no framework auto-setting it). React defaults to "dev mode" — extra debugging checks, much slower than production mode. Vercel's env auto-sets NODE_ENV=production; Workers didn't for unframed SSR.
  • Hardware generation lottery. Cloudflare runs gens 10/11/ 12 concurrently; re-running the test tends to hit the same machines — so "noise cannot be corrected by simply running more iterations". Correcting requires issuing requests from multiple geographic locations to hit different POPs.
  • Multitenancy / noisy-neighbor at the memory-bandwidth level; cores aren't shared but memory bandwidth can be.
  • Network latency baked into client-side timing. Original benchmark ran client on a SF laptop, measured wall-clock to Cloudflare/Vercel servers.
  • Benchmark methodology changes Cloudflare made: ran test client from AWS us-east-1 (same DC as Vercel iad1 per Cloudflare's understanding) to minimize network latency; used 1-vCPU Vercel instances (benchmarks single-threaded, Vercel CTO confirmed no difference); submitted PR #5 with bug fixes.
  • Result after all Cloudflare-side fixes: parity with Vercel on every benchmark case except Next.js, where the gap has closed considerably and is being chipped at with further OpenNext patches. Cloudflare explicitly states: "Most real- world applications on Workers and Vercel are bound by databases, downstream services, network, and page size. End user experience is what matters. CPU is one piece of that picture."

Operational numbers

  • Workers isolate memory default: 128 MB (V8 young-gen was tuned to 2017 V8 guidance for ≤ 512 MB envs).
  • Young-gen tuning fix impact: ~25 % benchmark improvement, small memory-usage uplift, deployed globally.
  • OpenNext 5 MB response: streamed chunks copied on every hop out of the renderer.
  • 50 × 2048-byte Buffers allocated per pipeThrough() in the React/Next.js render pipeline (mostly unused).
  • 100,000+ reviver invocations per Next.js request on the benchmark (one per key/value pair + one per array element).
  • Buffer.concat + discard: getBody().length pattern doing full-buffer concat just to read the total-byte count.
  • ~33 %: JSON.parse with reviver speedup from Cloudflare's V8 patch; ships V8 14.3 / Chrome 143.
  • 1 vs 2 vCPU Vercel instances: benchmarks single-threaded; Vercel CTO Malte Ubl confirmed the CPU count doesn't matter.
  • Pricing: Cloudflare $0.072/hr globally vs Vercel $0.128/hr in iad1 — Cloudflare cheaper; note pricing is in CPU-time not wall-clock, so isolate-queue waits (the bulk of benchmark disparity) were never billed as CPU.
  • Cloudflare POP count: 330+ cities worldwide — vs Vercel's centralized placement, one of the methodology's network-latency skews.

Systems / concepts / patterns introduced

Systems

  • systems/cloudflare-workers — V8-isolate-based serverless compute; per-request isolate routing, per-isolate memory budget, CPU-time billing (not wall-clock).
  • systems/opennext — OSS Next.js portability adapter layer making Next.js deployable on Cloudflare, AWS Lambda, Netlify, etc.
  • systems/nextjs — Vercel-backed React application framework. Stub.
  • systems/v8-javascript-engine — Chromium's open-source JS engine; underlies Chrome, Node.js, Deno, Workers. Embeddable by design; young-generation GC size is one of the tunable knobs.
  • systems/web-streams-api — W3C ReadableStream / WritableStream / TransformStream; the Workers-preferred streaming API; Node.js has a separate, older Node streams API.
  • systems/react — Meta's UI framework; uses JSON.parse(text, reviver) heavily.

Concepts

  • concepts/v8-young-generation — V8's young-generation GC space; size is a tunable knob; embedder (Workers, Node.js, Chrome) chooses or lets V8 heuristic pick.
  • concepts/benchmark-methodology-bias — correlated noise, TTFB/TTLB skew, NODE_ENV handling, hardware-generation lottery, network latency — all confounders invisible to more-iterations-averaging.
  • concepts/stream-adapter-overhead — converting between Node.js streams and Web Streams double-buffers; value-oriented streams default to highWaterMark: 1 (one enqueue = one read).
  • concepts/warm-isolate-routing — routing requests to already- loaded isolates to avoid cold starts; heuristic tuning has workload-shape bias (CPU-bound vs I/O-bound).

Patterns

  • patterns/upstream-the-fix — fix the ecosystem primitive (V8, Node.js, OpenNext) instead of just your own platform; Cloudflare fixed JSON.parse(reviver) in V8 (helps Node.js / Chrome / Deno), opened a Node.js PR to enable the faster trig path (helps AWS Lambda / Vercel, no benefit to Cloudflare customers), and submitted PRs to OpenNext.

Caveats

  • Benchmark result numbers are Cloudflare-reported from their own re-run; no independent third-party reproduction cited in the post.
  • "Parity on all benchmarks except Next.js" — Next.js gap remains; Cloudflare's framing is "we expect to be able to eliminate it" but it's not eliminated today.
  • The post names Vercel as the specific competitor and AWS Lambda as its underlying substrate, but does not break out how much of the Vercel-side numbers are Lambda-side characteristics vs Vercel-specific.
  • Cloudflare's implicit framing: CPU-bound synthetic benchmarks are "not an ideal choice to represent web applications" (their words — also quoting Theo's own video).

Source

Last updated · 200 distilled / 1,178 read