Skip to content

PATTERN Cited by 6 sources

Upstream the fix

Intent

When a performance / correctness / security issue lives in a shared ecosystem primitive (language engine, standard library, OSS framework), fix it upstream rather than around it — even when a platform-private patch would be easier and would deliver the win to your customers faster. The upstream fix:

  1. Benefits the whole ecosystem, including competitors.
  2. Reduces your long-term maintenance surface (no platform-private fork to carry).
  3. Compounds over time as downstream consumers pick up the fix.
  4. Deliberately includes cases where you don't benefit at all but "a bug is a bug and we like making everything faster" (the quote-from-Cloudflare-canonical-instance below).

Canonical instance: Cloudflare, 2025-10

In their response to Theo Browne's Workers-vs-Vercel CPU benchmark, Cloudflare opened four distinct upstream contributions across three upstream repos, each with a different distribution of who benefits:

1. V8: JSON.parse(reviver) ~33 % speedup

Profiling showed JSON.parse(text, reviver) called 100,000+ times per Next.js request. V8 recently added a third argument to the reviver callback (per the TC39 proposal-json-parse-with-source) providing JSON source context — and the added argument slowed the path for everyone.

Cloudflare upstreamed Chromium CL 7027411 giving roughly 33 % speedup on parse-with-reviver. Ships in V8 14.3 / Chrome 143.

Benefits: Cloudflare Workers + Node.js + Chrome + Deno + every other V8 embedder. Cloudflare's customers are one among many.

2. Node.js: enable the faster trig compile flag

V8 has a compile-time flag V8_USE_LIBM_TRIG_FUNCTIONS that selects a faster trig implementation on supported platforms. In Workers the flag is on by default (mostly coincidentally). In Node.js it's off (Node's compilation uses broader-compat defaults across OSes / arches).

Cloudflare opened nodejs PR #60153 to enable the flag in Node.js.

Benefits: AWS Lambda (Node.js runtime), Vercel, every other Node.js user on supported platforms. Zero benefit to Cloudflare customers — Workers already uses the flag.

Cloudflare's explicit framing:

"This change won't benefit our customers, since Cloudflare Workers already uses the faster trig functions, but a bug is a bug and we like making everything faster." (Source: sources/2025-10-14-cloudflare-unpacking-cloudflare-workers-cpu-performance-benchmarks)

3. OpenNext: adapter-layer PRs

Cloudflare's profiling found multiple allocation / copy / buffering inefficiencies in OpenNext itself and submitted PRs to fix them — including fixes that apply to other OpenNext platforms (AWS Lambda, Netlify, etc.), not just Cloudflare.

"We intend to keep iterating through OpenNext code, making improvements wherever they're needed — not only in the parts that run on Workers. Many of these improvements apply to other OpenNext platforms. The shared goal of OpenNext is to make NextJS as fast as possible regardless of where you choose to run your code."

4. The benchmark itself

The cf-vs-vercel-bench contained real bugs (the unset NODE_ENV in the React SSR case, etc). Cloudflare submitted PR #5 with test fixes back to Theo's benchmark repo.

Second instance: Cloudflare, 2026-02 — Vercel collaboration

Four months after the benchmark-response post, the pattern repeats with a third-party collaborator. Malte Ubl (Vercel) published independent research on Node.js Web streams performance (We Ralph Wiggum'd WebStreams), proposing changes to the Node.js implementation that eliminate promises on certain code paths and yield up to 10× speedup.

Snell — Node.js TSC member and the author of the 2026-02-27 Cloudflare post — commits publicly:

"As one of the core maintainers of Node.js, I am looking forward to helping Malte and the folks at Vercel get their proposed improvements landed!" (Source: sources/2026-02-27-cloudflare-a-better-streams-api-is-possible-for-javascript)

The pattern is the same: a performance issue in a shared runtime (Node.js) identified by one company (Vercel) is championed upstream by a core contributor from a different company (Cloudflare) whose customers get no direct benefit from the Node fix (Cloudflare Workers doesn't run Node.js Web streams). The bugs benefit Node-on-AWS-Lambda, Vercel Serverless Functions, and self-hosted Node — a whole-ecosystem win.

This is the sibling shape to the 2025-10-14 Node.js trig-flag PR: upstream ownership expressed as accelerating a competitor's improvements into a shared substrate you're a maintainer of.

Third instance: Datadog Agent, 2024-2025 — Go binary size

A year later, the pattern repeats in a different language substrate. Datadog's 6-month Agent-binary reduction program (described in sources/2026-02-18-datadog-how-we-reduced-agent-go-binaries-up-to-77-percent) produced upstream PRs to four distinct projects, each a different distribution of who benefits:

The compound benefit: Kubernetes project contributors subsequently enabled method-DCE and reported 16-37 % binary-size reductions across Kubernetes's own binaries — Datadog cleared the trail (identified the offenders, wrote the patches, upstreamed them), Kubernetes inherited the win for free. A clean case of pattern-as-ecosystem-accelerator: the PRs that shrank the Agent also shrink every other large Go binary downstream.

Datadog's explicit framing mirrors Cloudflare's:

"Our efforts also helped spread awareness of the optimization: Kubernetes project contributors began enabling it in their own binaries and reported 16 % to 37 % size reduction, bringing similar improvements to the wider community." (Source: sources/2026-02-18-datadog-how-we-reduced-agent-go-binaries-up-to-77-percent)

Where Cloudflare's canonical instance is language-engine (V8 + Node.js + OpenNext in JavaScript), Datadog's is toolchain-policy (Go linker + standard library + core runtime ecosystem). Same shape: fix it where the rest of the ecosystem can pick it up.

Fourth instance: Fly.io, 2025-02 — Rust TLS ecosystem

A few months later, the pattern repeats in the Rust async ecosystem (Source: sources/2025-02-26-flyio-taming-a-voracious-rust-proxy). An incident in Fly.io's fly-proxy edge router — CPU pegging on two IAD hosts with elevated HTTP error rate — was diagnosed via flamegraph to a spurious-wakeup busy-loop in tokio-rustls TlsStream, specifically the close_notify with buffered trailer edge case already tracked as tokio-rustls#72.

Fly.io's resolution: rustls PR #1950 — upstream fix in rustls itself, which is the shared TLS primitive for "essentially every Rust async TLS deployment". The post's own framing makes the upstream-vs-private-fork choice explicit:

"TlsStream is an ultra-important, load-bearing function in the Rust ecosystem. Everybody uses it." (Source: sources/2025-02-26-flyio-taming-a-voracious-rust-proxy)

Same shape as the previous three instances: the bug is in an ecosystem-load-bearing primitive; patching your own fork optimises for today and leaves the landmine for everyone else. Fly.io's fix benefits every downstream Rust-TLS user regardless of whether they run on Fly.io or their own hardware.

This instance is also the canonical pairing with patterns/flamegraph-to-upstream-fix — the end-to-end diagnostic workflow (symptom → flamegraph → type signature → known upstream issue → PR → partner resumes clean) that produces an upstream-the-fix contribution.

Fifth instance: Cloudflare, 2025-10 — Go arm64 compiler

The pattern repeats at the toolchain-codegen level (Source: sources/2025-10-08-cloudflare-we-found-a-bug-in-gos-arm64-compiler). Cloudflare's fleet surfaces recurring fatal Go panics ("traceback did not unwind completely" and SIGSEGVs at m.incgo+0x118) on arm64 only; weeks of investigation reveal a one-instruction race condition in Go's arm64 compiler: for stack frames > 1<<12 bytes the epilogue's SP adjustment is emitted as two ADD opcodes because of arm64's 12-bit ADD immediate (systems/arm64-isa). Async preemption landing between those opcodes crashes the stack unwinder on the next GC cycle.

Cloudflare's resolution:

Same shape as the previous four instances: the bug lives in a primitive every Go program on arm64 depends on (compiler + runtime + ISA). A private workaround (e.g. artificially reduce stack frame sizes to avoid the split-ADD threshold) would have kept the rest of the Go-arm64 ecosystem exposed to the same landmine. The upstream fix delivers the win to every downstream Go-arm64 user — AWS Graviton workloads, Apple Silicon developer laptops, ARM cloud providers, Kubernetes nodes on arm64 — not just Cloudflare's fleet.

This instance is also the canonical wiki pairing with concepts/compiler-generated-race-condition — by definition, compiler-generated races must be fixed at the toolchain level; there is no correct user-code workaround.

Sixth instance: Fly.io, 2025-05 — parking_lot RwLock

The pattern repeats a second time in Fly.io's Rust stack, this time in the synchronization-primitive library layer rather than the TLS layer (Source: sources/2025-05-28-flyio-parking-lot-ffffffffffffffff). fly-proxy's Catalog lazy-loading refactor exposed a bitwise double-free in parking_lot's RwLock: the try_write_for timeout path and the reader-release unpark path could both claim to clear the WRITER_PARKED bit, causing the atomic self-synchronizing clear to saturate the 60-bit reader counter and set all 4 signaling bits → lock word 0xFFFFFFFFFFFFFFFF → every thread waits for a lock no thread holds. After a weeks-long descent into madness (core dumps, gdb, miri, guard pages, the parking_lot deadlock detector, suspected Tokio interactions, suspected Rust compiler bugs), the read_recursive desperation probe (patterns/read-recursive-as-desperation-probe) surfaced RwLock reader count overflow log messages that were the first direct evidence of lock-word corruption.

Fly.io's resolution: - parking_lot issue #465 — reproducer + logged-condition evidence in a forked parking_lot build. - parking_lot PR #466 — upstream fix: the writer bit is cleared separately in the same wakeup queue as the reader so the two clear operations can't race.

Same shape as the previous five instances. parking_lot is "an important, well-regarded, and widely-used replacement for the standard library's lock implementation" — fix it upstream rather than forking.

This instance is also the canonical wiki pairing with concepts/bitwise-double-free — like compiler- generated races, bitwise double-frees in tightly-packed state words are a library-internal invariant violation that can only be fixed in the library's wake-up-path coordination.

Fly.io's pattern count across the wiki now stands at two (Rust-TLS via rustls PR #1950; Rust-locks via parking_lot PR

466), both in load-bearing Rust ecosystem primitives.

Prerequisites

Not every org can ship this pattern at Cloudflare's cadence. The enabling conditions are:

  • Core-contributor relationships in the upstream project. Cloudflare "employs multiple core contributors" to V8 and Node.js — their PRs land fast.
  • Engineering-time budget. Upstreaming is slower than a private patch (review cycles, test matrices, release cadences, policy negotiations).
  • Org culture that values ecosystem over exclusivity. Named directly in the post: "Our approach is simple: improve open source infrastructure so that everyone gets faster, then make sure our platform makes the most of those improvements."
  • Licensing compatibility with contribution — Apache 2.0 / MIT / BSD make this cheap; viral licenses (GPLv3) would not.

When not to upstream

  • Private platform / proprietary substrate. You can't push to what you don't have commit on.
  • Urgent security. Coordinated disclosure with a secret embargo period precedes public upstream.
  • The upstream isn't maintained. Rare; more often a fork you do re-contribute upstream later (e.g., libpq patches at AWS).

Sibling patterns

  • patterns/open-governance-as-technical-constraint — structural discipline at project-governance level (Home Assistant / Open Home Foundation immutable charter). Upstream- the-fix is the engineering-team-level discipline that produces long-run ecosystem health the governance layer protects.
  • Reference-hardware-for-ecosystem (Home Assistant Green) — adjacent "ecosystem-wide leverage" shape in a different substrate.

Seen in

Last updated · 200 distilled / 1,178 read