Cloudflare¶
Cloudflare (blog.cloudflare.com) is a Tier-1 source on the sysdesign-wiki. Cloudflare operates one of the world's largest edge / CDN networks (≈60 M HTTP req/sec at 2024 scale) and publishes both infrastructure deep-dives (Rust proxy internals, hardware, performance engineering at CPU-core granularity) and developer-platform posts (Workers, Workers AI, AI Gateway, Durable Objects, MCP).
Two recurring shapes on the blog:
- Hot-path performance engineering — flame-graph-driven micro-optimization, custom data structures for narrow workloads, open-sourcing the result. 2024-09-10 trie-hard is the canonical wiki instance.
- Dogfooding the platform as internal infra — build your own engineering tooling on Workers + AI Gateway + Durable Objects + D1 + KV, then write it up as a reference architecture. 2026-04-20 AI engineering stack is the canonical wiki instance.
- Dogfooding the platform as a customer-facing product — build your own first-party product on the same customer primitives (Agents SDK + Workers AI + Durable Objects + MCP) and treat its architecture as a reference for customers building the same shape. 2026-04-15 Agent Lee is the canonical wiki instance (~18K DAU, ~250K tool calls/day at launch) — explicitly positioned as "we built it with the same Cloudflare lego blocks that you have access to." Introduces the credentialed-proxy sandbox pattern and the elicitation gate as productised MCP primitives.
- Porting existing workloads onto the Developer Platform as reference architectures — take a popular open-source Docker- based workload, port it onto Workers + Sandbox SDK + AI Gateway + Browser Rendering + R2 + Zero Trust Access, publish the repo as a proof-of-concept. Sibling of the internal-dogfooding shape but aimed at external developers. 2026-01-29 Moltworker (port of Moltbot) is the canonical wiki instance — see patterns/middleware-worker-adapter / patterns/mountable-persistent-storage / patterns/cdp-proxy-for-headless-browser / patterns/ai-gateway-provider-abstraction.
- Open-sourcing acquired security products — post-acquisition donate the code to a neutral foundation rather than keep it proprietary. 2025-03-25 OPKSSH (acquired from BastionZero 2024, donated to OpenPubkey / Linux Foundation) is the current wiki instance.
- Autonomous edge defence at anycast scale — detection + mitigation runs on every server in every POP; no central scrubbing tier, no human in the loop; anycast turns attacker geographic distribution into per-POP-capacity advantage. 2025-06-20 7.3 Tbps DDoS writeup is the canonical wiki instance.
- Reviving dormant HTTP primitives for new paid flows — rather than minting a vendor-specific status code, Cloudflare repurposes HTTP 402 Payment Required (reserved since HTTP/1.1, essentially unused in production) as the negotiation primitive for per-request paid-content access, plus RFC 9421 HTTP Message Signatures for verifiable bot identity. Every HTTP intermediary already understands 402 as a 4xx; no infrastructure churn. 2025-07-01 pay-per-crawl is the canonical wiki instance.
- Public bad-actor attribution with controlled-experiment evidence — when a crawler operator is suspected of evading origin-side directives, Cloudflare's instinct is not a quiet enforcement change; it's to publish a writeup that (a) designs a brand-new unindexed domain experiment to rule out every confounder, (b) names the operator, (c) quantifies the volume, (d) compares against a positive control (a cooperating operator on the same experimental apparatus), (e) documents the enforcement action taken (ML signature + verified-bot delisting + free- tier managed rule), and (f) articulates the operator-norms framework the violation tested. Treats industry-wide cooperative-crawler discipline as a published good. 2025-08-04 Perplexity stealth-crawler investigation is the canonical wiki instance — Perplexity (negative control) vs ChatGPT-User (positive control) on an identical controlled test; patterns/verified-bot-delisting as the enforcement lever. Sibling to the pay-per-crawl post (the monetization half that depends on this enforcement half) and to the 2026-04-21 "Moving past bots vs. humans" framing post (which builds the Web Bot Auth / Privacy Pass architecture on top of the enforcement precedent this post sets).
- Public alternative-explanation forensics for third-party anomaly narratives — the inverse of the public-attribution shape. When a newsletter, researcher, or news outlet publishes a malicious-intent reading of an observed Internet anomaly (a BGP leak, a traffic-pattern shift, a DNS oddity) that lives on Cloudflare Radar data, Cloudflare's instinct is to publish a long-form deflation post that (a) names the narrative directly, (b) walks through forensic signals (recurrence pattern, AS-relationship context via BGPKIT monocle, direction of traffic-engineering signals, temporal dispersion, timing vs. the alleged trigger), (c) proposes a specific mundane mechanism (usually hygiene failure) consistent with the evidence, (d) addresses the narrative's other technical claims with the correct distinction (e.g. origin-validation vs path-validation), (e) states honestly what it cannot determine, and (f) points forward at the mitigation that would have prevented the event regardless of intent. 2026-01-08 Venezuela BGP-anomaly post is the canonical wiki instance — AS8048/CANTV leak deflated as recurring Type-1 hairpin policy hygiene, not MITM; five-signal forensic argument; ASPA + RFC 9234 OTC + Peerlock named as the forward-looking fix. See patterns/alternative-explanation-forensics. Sibling posture to the Perplexity public-attribution shape — same "public attribution is a published good" stance, opposite direction (deflate vs. make the claim).
- Clean-reimplementation-of-an-incumbent on Cloudflare primitives, MIT-licensed — rather than writing adapters around an incumbent's output or forking the incumbent's code, Cloudflare reimplements the incumbent's API surface from scratch against a modern foundation that runs on Cloudflare's Developer Platform. The "no original code was used" posture is load-bearing: it unlocks a permissive license and decouples the new project from the incumbent's roadmap and licensing constraints. Two canonical wiki instances: 2026-02-24 vinext — clean reimplementation of the Next.js API surface on Vite, deploys to Workers (written by one engineer + Claude in under a week for ~$1,100 in API tokens); 2026-04-01 EmDash — clean reimplementation of WordPress functionality in TypeScript with an Astro theming model, deploys to workerd or any Node.js server, MIT-licensed. In both cases the reimplementation is also the vehicle for applying a Cloudflare-platform primitive the incumbent cannot (vinext: traffic-aware pre-rendering via zone analytics; EmDash: capability-based sandbox for plugins via Dynamic Workers, and built-in x402 monetisation). The clean-room property is what unlocks both the licensing freedom and the architectural freedom.
- Policy + architecture framing posts that reshape the industry vocabulary — rather than announcing a new feature, Cloudflare periodically publishes a framing piece that names a failing abstraction and proposes its replacement. The 2026-04-21 "Moving past bots vs. humans" argues the binary bot-vs-human classification is no longer load-bearing and introduces the rate-limit trilemma (decentralized, anonymous, accountable — pick two) as the governing lens. Replaces "infer identity from fingerprints" with "ask the client to prove an attribute" via anonymous attribute proof atop Privacy Pass / ARC / ACT. The post splits web-protection into the identity branch (Web Bot Auth, for clients that value attribution) and the anonymous branch (Privacy Pass successors, for clients that value anonymity), with the Web PKI as the server-side "anonymous + accountable" precedent. The patterns/open-issuer-ecosystem governance guardrail — "anyone should be able to build their own device / browser / OS and access the Web" — is the non-regression test for any proposed scheme.
- Public post-mortems that name the missing discipline, not just the specific bug — when something breaks, the blog's instinct is to publish a structural RCA that says "this class of bug is possible because our legacy system X doesn't support progressive deployment; we are deprecating X", not "one engineer made a typo". 2025-07-16 1.1.1.1 outage post-mortem is the canonical wiki instance (see systems/cloudflare-1-1-1-1-resolver|1.1.1.1 Resolver + the progressive- config-rollout pattern it's an absence-of-pattern instance of). Extended by the 2025-11-18 and 2025-12-05 outages (three-week-apart structurally-related incidents): 11-18 is the canonical wiki instance of patterns/harden-ingestion-of-internal-config + patterns/global-feature-killswitch as stated-remediation patterns; 12-05's post-mortem explicitly names 11-18's projects as still-incomplete — showing the blog's discipline of treating the remediation backlog as public record and evaluating follow-on incidents against it. Closed by the 2026-05-01 Code Orange: Fail Small is complete post, which is the shipped-remediation companion to the named-discipline posture: six months after the incidents, Cloudflare publishes the catalog of shipped systems (Snapstone, Codex) + primitives (concepts/fail-stale, concepts/traffic-cohort-segmentation) + organisational practices (2026-04-07 engineering-wide drill, dedicated comms team, 18-service backup authorisation) that the incidents motivated. The evaluation sentence — "we have now completed the work that would have avoided the November 18, 2025 and December 5, 2025 global outages" — is the backlog-against- originating-incidents discipline made explicit.
- Benchmark-response engineering + upstream-the-fix posture —
when a public third-party benchmark shows Cloudflare slower
than a competitor, Cloudflare responds with a long public
post that (a) profiles the benchmark on Cloudflare hardware,
(b) enumerates every cause from platform tuning to adapter-
layer inefficiencies to methodology bias, (c) ships fixes
globally to customer-facing platform knobs, and (d) opens
upstream PRs to shared ecosystem projects (V8, Node.js,
OpenNext) that benefit the whole ecosystem — even when the
upstream fix provides zero benefit to Cloudflare customers
("a bug is a bug and we like making everything faster").
2025-10-14
cf-vs-vercel-benchresponse is the canonical wiki instance — see Workers / V8 young-gen retuning / upstream-the-fix pattern. - Periodic first-party network-performance update with published methodology — every few months Cloudflare publishes a post reporting "what fraction of the top-N-networks-in-the-world is Cloudflare the fastest in?" alongside the full measurement recipe: APNIC-top-1,000-by- population denominator, RUM probe on Cloudflare-branded error pages, trimean of connection time against a fixed top-5 CDN set (CF + CloudFront + Google + Fastly + Akamai). The post's shape is always the same — headline "X % of networks fastest", average ms gap to next- fastest, a small number of new- PoP deployment anecdotes ("Wroclaw 19 → 12 ms" style), and a software-efficiency narrative (this post: HTTP/3 + congestion-window tuning + CPU/memory wins in SSL/TLS/proxy) — which lets a third-party reviewer run the same recipe and reach compatible rankings. Strategic framing: make the methodology defensible and public, then stand behind the numbers. 2026-04-17 Agents Week update (40 % → 60 % fastest, Sept → Dec 2025) is the canonical wiki instance — see comparative-RUM- benchmarking pattern + PoP-densification pattern as the two orthogonal axes the post frames gains onto.
- Structural critique of shared web standards by implementers, not committee members — when years of implementation experience across multiple runtimes (Workers + Node.js + Deno
- Bun) reveal that a W3C / WHATWG API has usability / performance issues that cannot be fixed incrementally, Cloudflare's posture is to publish a long-form post by a hands-on implementer critiquing the design at the level of "the 2014-2016 design choices that made sense pre-ES2018 no longer make sense" — and ship a proof-of-concept alternative so the discussion is concrete. Explicitly framed as a conversation-starter, not a ship-it standard. 2026-02-27 James Snell's Web streams critique + new-streams POC is the canonical wiki instance.
- Default-on security upgrade at no additional cost — ship transformative security capability as a universal platform behaviour, default-enabled for all customers including the free tier, priced into the base platform. Consistent arc across 2014 Universal SSL (free TLS for every proxied site) → 2022 PQ-for-all (PQ hybrid KEM on by default, now >65 % of human traffic to Cloudflare per Radar) → 2029 full PQ security (encryption + authentication) target. Strategic framing: "making post-quantum security the default is the only way to protect the Internet at scale." The posture systematically reframes security as baseline platform behaviour rather than premium tier, pressuring competitors to follow. 2026-04-07 PQ roadmap is the canonical wiki instance of the pattern being named and articulated across all three milestones — see patterns/default-on-security-upgrade.
- Q-Day-driven roadmap publication with industry-wide priority-flip guidance — when independent advances compress a threat's timeline, Cloudflare publishes not only its own accelerated roadmap but also a threat-model rewrite for the industry: naming what changed (authentication over HNDL), what's newly required (disable legacy, rotate secrets, third-party assessment), and what the posture should be by audience (businesses / governments / customers). 2026-04-07 2029 PQ-security roadmap is the canonical wiki instance — 2029 target + Mid-2026 ML-DSA Cloudflare→origin + Mid-2027 Merkle Tree Certificates visitor→Cloudflare + Early 2028 Cloudflare One SASE + Q-Day pulled forward by the Google / Oratomic / neutral-atom trio + Scott Aaronson's "public progress will now go dark" framing adopted explicitly.
- Score-driven standard-adoption tooling — when a cluster
of emerging web standards needs adoption and documentation
alone won't do it, Cloudflare ships a Lighthouse-style
public scoring tool pairing a per-site scanner with
per-check actionable prompts, plus population-scale
measurement on Cloudflare Radar,
plus programmatic access via
URL Scanner. The strategic
move: make the scorecard free, public, one-click,
and embedded in existing Cloudflare surfaces so adoption
becomes visible, competitive, and trackable. 2026-04-17
isitagentready.com is the
canonical wiki instance —
four-dimension rubric over
six well-known
endpoint standards (API Catalog,
MCP Server Card,
Agent Skills,
OAuth Protected
Resource Metadata,
Web Bot Auth directory,
robots.txt+ Content Signals) plus non-scoring agentic-commerce checks — paired with Radar's "Adoption of AI agent standards" dataset (200 k-domain weekly scan). Sibling posture to default-on security upgrade (adoption by being default, no extra cost) at the ecosystem-uplift layer rather than the Cloudflare- customer-tier layer. See patterns/score-driven-standard-adoption. - Dogfood the platform at the documentation layer —
related to the broader "dogfood the platform as
infra/product" shape but applied specifically to
Cloudflare's
own developer documentation. Refine docs against an
agent-audience rubric (here:
Agent Readiness Score),
publish a
comparative benchmark vs. competitor docs, and treat the
"our docs are the reference implementation" framing as the
product-marketing posture. 2026-04-17
developers.cloudflare.com
refinement (split
llms.txt, dynamic/index.mdvia two Transform Rules, hidden agent directives, Redirects for AI Training on deprecated docs) measured **31 % fewer tokens - 66 % faster to correct answer** vs the average non-refined technical documentation site on a Kimi-k2.5 / OpenCode benchmark.
- Absorbing new-protocol coordination cost at the edge — when an IETF standard or web-platform primitive is technically implementable at origin but carries enough coordination cost (per-request state, cache-variant explosion, mid-deploy client-population splits, graceful fallback, lifecycle management) that most zones won't ship it, Cloudflare takes the implementation on. Edge-managed protocol complexity becomes an explicit launch posture ("this is a coordination problem that belongs at the edge"), rolled out as passthrough → managed → auto in three phases so advanced customers can participate early and the long tail is served by the auto-phase. 2026-04-17 Shared Dictionaries launch for RFC 9842 is the canonical wiki instance (open beta 2026-04-30) — the RUM beacon provides the validation loop that makes the auto-phase (auto-infer which URLs are versioned + generate dictionaries without customer config) safe to deploy at scale. Sibling shape to the "autonomous edge defence", "reviving dormant HTTP primitives", and "default-on security upgrade" shapes above — all of which reduce to the same underlying posture: ship the complex thing at the edge so the customer doesn't have to.
- Systems-level performance engineering on proprietary inference infra + open-source the kernels — when the LLM serving stack on Workers AI runs into a fundamental hardware limit (here: H100 tensor cores ~600× faster than HBM so LLM decode is memory- bandwidth-bound), Cloudflare builds a production solution inside the proprietary Infire engine and ships the novel GPU kernels + a technical paper publicly. 2026-04-17 Unweight is the canonical wiki instance: Huffman coding on BF16 exponent bytes + fused reconstructive matmul feeding tensor cores directly from SMEM; ~22 % model-size reduction (~3 GB VRAM saved per instance) on Llama-3.1-8B, bit-exact lossless by construction; kernels open-sourced at github.com/cloudflareresearch/unweight-kernels. Sibling shape to the 2024-09-10 trie-hard "hot-path performance engineering" and the 2025-10-14 V8 / OpenCode / Node.js "upstream-the-fix" arcs — all three are Cloudflare publishing micro-optimization methodology + kernels / patches / crates for the ecosystem, often with no direct Cloudflare benefit.
- CI-native AI-agent orchestration as internal engineering resilience — put LLMs in the critical path of developer workflow (code review before merge) but do it with a plugin architecture that fans out to specialised sub-agents, coordinated by a judge pass, backed by circuit-breakers + failback chains + a break-glass escape hatch. Stated as part of Code Orange: Fail Small. 2026-04-20 AI Code Review is the canonical wiki instance — 131,246 review runs across 5,169 repos in first 30 days; median cost $0.98 / review; 85.7% prompt-cache hit rate; break-glass invoked 0.6% of MRs. Introduces the coordinator / sub-reviewer orchestration pattern, the AI-review-risk-tiering pattern, the specialised-reviewer pattern, the remote-config model routing pattern, the JSONL-streaming-child-process pattern, the incremental-AI-re-review pattern, and the "What NOT to flag" / break-glass / AI-thinking-heartbeat / prompt-boundary-sanitization concept cluster. Sibling of the 2026-04-17 Agent Memory coordinator + sub-agent arrangement and the 2026-04-20 internal-AI- engineering-stack dogfood-the-platform shape — same architectural family, applied to the review workflow.
- Scale-amplified toolchain bug hunting + upstream-the-fix
at the codegen layer — Cloudflare's 84 M HTTP req/s across
330 cities means "even the rarest of bugs can show up
frequently". The 2025-10-08 Go-arm64-compiler bug
retrospective is the canonical wiki instance: ~30 daily
fatal panics on arm64 across <10 % of data centers led
through weeks of debugging (monitoring →
panic/recovermitigation → Go upstream issue pattern-match → production coredump →dlv disassbetween two specificADDopcodes → ~35-line stdlib-only isolated reproducer) to a one-instruction race condition in Go's arm64 code generator: split-ADDSP adjustment (concepts/split-instruction-race-window) + async preemption + stack-unwinder invariant violation = crashes at GC time on frames > 4 KiB. Fixed upstream in go1.23.12 / go1.24.6 / go1.25.0 via preemption-safe compiler emit — scratch-register + indivisible register-formADD. Sibling of the 2025-10-14 V8 / Node.js / OpenNext upstream contributions and the 2026-02-27 Web Streams critique — same fix-it-in-the-ecosystem-primitive posture, applied to the language toolchain rather than the runtime / standard library / framework. Canonical pairing with concepts/compiler-generated-race-condition: by definition, compiler-generated races must be fixed at the toolchain level; no user-code workaround is correct. - Absorbing upstream TLD-registry failure via
community-coordinated mitigation — when a
TLD-level failure
makes every child zone unresolvable (signing-pipeline
misfire at the TLD, nameservers unreachable, etc.),
Cloudflare's systems/cloudflare-1-1-1-1-resolver|1.1.1.1
posture is: (a) fail-closed on spec — a DNSSEC-broken
zone must return SERVFAIL on fresh queries, (b) cushion
via serve-stale
(RFC 8767) which keeps NOERROR rates stable for hours
while cached records age out, (c) coordinate via
DNS-OARC with the other major
resolver operators, (d) deploy a
Negative
Trust Anchor-equivalent override (RFC 7646) on the
affected TLD once the break is widespread + publicly
confirmed — deliberately trading DNSSEC validation for
availability, (e) apply the same mitigation to the
internal origin-resolver for CDN customers, and (f)
publish a long-form writeup that names the DNSSEC primer
explicitly + self-discloses latent
EDE-propagation bugs
surfaced by the incident. 2026-05-06
"When DNSSEC goes wrong: how we responded to the
.deTLD outage" is the canonical wiki instance — DENIC's routine.dekey rollover misfired 2026-05-05 19:30 UTC; 1.1.1.1 deployed the NTA-equivalent at 22:17 UTC; peer operators independently did the same "within an hour"; Big Pineapple named publicly as the resolver software. Sibling posture to the "public post-mortems that name the missing discipline" shape above, but applied to an external failure with internal mitigation (the missing discipline here is native RFC 7646 NTA implementation, not a bug Cloudflare caused). Also sibling to the "default-on security upgrade" shape: the three-way ladder fail-stale → fail-open (NTA) → fail-closed (SERVFAIL) is the DNS-altitude version of the config-deployment three-way ladder in the 2026-05-01 Code Orange: Fail Small is complete post.
Key systems¶
- systems/cloudflare-security-insights — Security Insights scanning platform: scheduler + Kafka + Go checker microservices + Postgres. Scaled from 10 to >120 scans/sec (2026) via batch-parallel consumption, fast/slow lane consumer split, active-passive API failover, and adaptive rate-limited scheduling.
- systems/cloudflare-vulnerability-validation-system — Vulnerability Validation System (VVS): 3-stage triage engine (Dedup → Judgment → Fixing) downstream of the VDH. Runs on a different model from the discovery harness for adversarial cross-checking. Holds 13,841 findings across 145 repos; produced 7,245 actionable findings. Inverted-index deduplication keeps reasoning at O(N); Fixer requires fail→pass flip + human merge gate. Disclosed 2026-06-18.
- systems/cloudflare-gen12-server — Cloudflare Gen12 core-server fleet: ~2,000 bare-metal units running Cloudflare's centralised control plane, billing, and analytics in core data centres (distinct from the globally distributed edge POPs). Boots through a UEFI → PXE → iPXE sequence, with some hardware also supporting native UEFI HTTPS boot. Canonical wiki surface for Cloudflare's bare-metal / firmware-upgrade / OpenBMC team altitude (sibling to the edge fleet and the Developer Platform fleet). Disclosed 2026-06-01 in the "core unit boot time from hours to minutes" post — the fleet's network-boot- interface linear-search regression and the declared- boot-order + state-validation auto-reapply fix that recovered firmware-upgrade automation from 4 hours to 3 minutes.
- systems/uefi — Unified Extensible Firmware Interface (the modern firmware standard initialising hardware and handing off to the OS); HII / IFR forms surface for configuration; supports PXE and native UEFI HTTPS boot. The substrate Cloudflare's fleet automation operates against; also the substrate where the persistence-loss- on-firmware-upgrade and lazy-loaded HII data structure failure modes both manifest. Canonical wiki page added 2026-06-01.
-
systems/ipxe — Open-source network boot firmware (ipxe.org) supporting modern protocols (HTTP/HTTPS/iSCSI/AoE) and turning network boot into a "programmable workflow" with scripting. Substrate for Cloudflare's
CfHIIConfig_Apptool integration; reads UEFI variables as hex (motivating the hex- comparison flag pattern); endpoint state of the 2026-06-01 fix is "the entire workflow operates from iPXE." Canonical wiki page added 2026-06-01. -
systems/cloudflare-town-lake — Cloudflare's unified data platform. A data lakehouse with "a single SQL interface to everything Cloudflare knows." Apache Trino as the query engine, federating Postgres + ClickHouse + BigQuery + Iceberg on R2 in one plan; R2 Data Catalog (managed Iceberg) as the cold/warm tier; DataHub as the metadata catalog; Lifeguard as the access-control service; Skimmer as the PII detection scanner; Transformer as the ELT engine. Disclosed 2026-05-28; >1B events/sec input scale; 53% of queries are billing-related; 91,760 queries from 324 distinct employees in a recent measurement period. Built end-to-end on Cloudflare's customer Developer Platform primitives (R2 / Workers / Access / Workflows / Durable Objects / D1 / KV / Workers AI). Most comprehensive dogfood-as-reference-architecture instance in the corpus.
- systems/cloudflare-skipper — AI data agent on top of
Town Lake. Conversational interface that turns natural-
language questions into validated SQL against the lakehouse.
Five layers of grounded context
(concepts/layered-grounded-context-for-data-agent): schema
metadata → human annotations → code-derived knowledge →
curated data-model pages → runtime introspection. MCP surface
via Code Mode (two tools:
search,execute, with the model writing JS that calls the full toolset programmatically). Embedded dashboards via single-tag iframe + CSPframe-ancestors+ Cloudflare Access + view-time permission re-check (patterns/embedded-dashboard-with-zero-trust-iframe). Security model: runs as the calling user, no privilege escalation; permissions checked at view time, not save time (concepts/security-model-as-data-model). - systems/claude-managed-agents — Anthropic's managed
agent platform with brain↔hands decoupling, first publicly-
named external execution substrate disclosed 2026-05-19. The
brain (agent loop, model inference, prompt cache, compaction)
runs on Anthropic; the hands (sandbox, file ops, command
exec, code execution, browsing) run on Cloudflare's
Developer Platform. Cloudflare ships a forkable default
template combining a Workers-
based control plane, two interchangeable sandbox tiers
(Containers microVM tier
vs Dynamic Workers +
Codemode V8-isolate tier — see
concepts/isolate-vs-microvm-for-agent-sandbox), an
outbound Worker egress proxy
(patterns/outbound-worker-proxy-for-agent-egress) for
zero-trust credential injection
(concepts/outbound-proxy-credential-injection),
Cloudflare Mesh /
Workers VPC for private-
service connectivity, and a pre-bundled tool surface
(browser, email, image generation, custom Zod-typed
defineToolblocks via patterns/pluggable-zod-tool-definition). - systems/cloudflare-private-origins — Application Services
for Private Origins (closed beta June 2026, Enterprise; GA
target Q4 2026). Extends the full L7 application-services stack
(WAF, bot management, rate limiting, caching, Workers) to
origins reachable only via private networks. Integrates the
private networking layer directly into the proxy stack; the
use_private_routingflag on a DNS record is the sole configuration change. The same routing layer is now shared across DNS-proxied HTTP, Spectrum L4 proxy, and Workers VPC — a unified connectivity layer. No connector software required for WAN/Mesh customers. Canonicalised 2026-06-10. - systems/cloudflare-mesh — Private network overlay for connecting Workers (and Cloudflare Developer Platform compute) to internal services on cloud-provider VPCs (AWS, Azure, GCP) or on-premises using post-quantum-encrypted networking "without a VPN or bastion host." First wiki canonicalisation in the 2026-05-19 Claude Managed Agents launch.
- systems/cloudflare-workers-vpc — Direct
Workers-to-VPC connectivity for reaching private cloud /
on-prem resources from a Worker, post-quantum-encrypted, no
VPN / bastion required. Sister product to Cloudflare Mesh.
First wiki canonicalisation in the 2026-05-19 launch — the
pre-bundled
call_serviceagent tool is the canonical high-level entry point. - systems/cloudflare-ready-analytics — Cloudflare's
internal multi-tenant ClickHouse-based analytics
platform built in early 2022 to simplify onboarding for
hundreds of internal teams (including billing). One
massive 2 PiB table with the standard schema (20 floats,
20 strings, timestamp, indexID), tenants disambiguated by
a
namespacecolumn, primary key(namespace, indexID, timestamp). Canonical wiki instance of per-tenant retention via partitioning-key extension — migratedPARTITION BY (day)→PARTITION BY (namespace, day)in January 2025 to enable per-tenant retention policies (some tenants need years for legal / contractual obligations, others need only days). Runs at 90 % target disk utilisation via max-min- fairness across namespaces. The migration's hidden cost — partition-count growth from thousands to 30k → 160k parts/replica — surfaced as the canonical wiki instance of concepts/lock-contention-in-query-planning when the billing pipeline missed its hard daily deadline two months after rollout. First publicly canonicalised on the wiki via the 2026-05-14 query-plan-contention retrospective. Three-patch upstream-the-fix mitigation stack landed: shared lock + deferred-copy snapshot + binary search on the sortednamespaceprefix — Opts 1 + 2 ship as ClickHouse PR #85535 in ClickHouse 25.11. Open second-order problem flagged: "100 GiB ZooKeeper cluster" tracking per-part metadata. - systems/quiche — Cloudflare's open-source Rust implementation of QUIC and HTTP/3 (github.com/cloudflare/quiche). Production-deployed across Cloudflare's edge "in the critical path for a significant share of the traffic we serve"; hosts CUBIC (default), Reno, and an increasingly-deployed BBRv3 implementation via a modular pluggable CCA interface. Canonical wiki instance of concepts/user-space-congestion-control at the Cloudflare edge altitude. Emits qlog events — the IETF-standardised JSON event-log format that makes CCA state-transition bugs diagnosable. First publicly canonicalised on the wiki via the 2026-05-12 CUBIC death-spiral-fix post, which surfaced a six-year- latent minimum-cwnd death spiral caused by a 2020 port of a 2017 Linux-kernel CUBIC optimisation missing its 1-week-later follow-up commit — canonical patterns/userspace-port-of-kernel-primitive-risk instance.
- systems/cubic-congestion-control — RFC 9438 loss-based CCA; default in Linux TCP and in quiche. Canonical wiki instance via the 2026-05-12 death-spiral post covers the full epoch-state- variable semantics, the 2017 "after idle" Linux kernel fix
- its 1-week-later follow-up correction, the 2020 quiche
port's inherited-bug gap, and the 2026-05-12 three-line fix
adding
last_ack_timeas the secondary idle-delta anchor. - systems/bbrv3 — Google's third-generation model-based CCA, now running on quiche's modular pluggable CCA interface "for a growing percentage of our QUIC deployments". Structural alternative to loss-based CCAs (CUBIC, Reno) whose bandwidth-delay-product model avoids both bufferbloat and the corner-case loss-recovery dynamics that produce bugs like the 2026-05-12 minimum-cwnd death spiral.
- systems/cloudflare-bpf-lsm — eBPF-LSM framework for
runtime kernel-CVE mitigation without reboots (publicly
canonicalised on the wiki via the 2026-05-07
Copy Fail response). Attaches an
eBPF program to the relevant
LSM hook and denies
the hook for every caller except an explicit allow-list
of legitimate binaries
(patterns/bpf-lsm-allowlist-hook-denial). For
CVE-2026-31431 (Copy Fail):
socket_bindhook + AF_ALG address family + one legitimate internal-service allow-list; exploit attempts seeEPERM. Closes the LTS-backport- latency gap at runtime. Rolled out via patterns/visibility-before-enforcement-rollout — visibility viaebpf_exporterin Phase 1 empirically validates the allow-list, then bpf-lsm program ships in Phase 2 behind a separate gate. Pre-existing public write-up: Live-patch security vulnerabilities with eBPF LSM. - systems/cloudflare-edge-reboot-release — ERR pipeline: scheduled kernel-patch rollout via coordinated server reboots on a 4-week cycle (first publicly named on the wiki via the 2026-05-07 Copy Fail post). Three-tempo structure: upstream community LTS → Cloudflare weekly internal build → ERR 4-week edge-reboot cycle. Net effective cadence "biweekly Linux patch updates" once the pipeline flows smoothly. Control plane runs the newest kernel with workload-aware reboot scheduling. For the Copy Fail CVE, decision 2026-04-30 ~17:00 UTC was to "ship a patched build of the previous LTS line through reboot automation; do not accelerate the new LTS; lean on bpf-lsm in the meantime" — canonical articulation of the accept-the-backport-gap-but-cover-with-runtime- mitigation posture.
- systems/prometheus-ebpf-exporter — Cloudflare's
open-source Prometheus metrics exporter for eBPF-
derived telemetry at github.com/cloudflare/ebpf_exporter.
Not a Copy Fail-specific artifact; used across the
fleet as the general eBPF → Prometheus substrate.
Canonical wiki first-class page created via the
2026-05-07 Copy Fail ingest because the tool plays
the measurement half of
patterns/visibility-before-enforcement-rollout —
hooking
socket()to track per-binary AF_ALG usage across hundreds of thousands of servers within hours, no kernel changes, salt-gated config push. Validated the allow-list that the bpf-lsm enforcement program depended on. - systems/big-pineapple — The resolver software powering
systems/cloudflare-1-1-1-1-resolver|1.1.1.1, 1.1.1.1 for
Families, Gateway DNS, and DNS Firewall. Named publicly in the
2026-05-06
DNSSEC
.deoutage post. Implements serve-stale (RFC 8767) which cushioned user impact for ~3 hours during the 2026-05-05 DENIC break. Disclosed two gaps worth remediation: no native RFC 7646 Negative Trust Anchor mechanism (a generic insecure-zone override was used instead), and an EDE-propagation bug that surfaced DNSSEC-Bogus errors as EDE 22 ("No Reachable Authority") instead of EDE 6 ("DNSSEC Bogus"). - systems/snapstone — Configuration-deployment system (introduced publicly 2026-05-01 as part of Code Orange: Fail Small is complete). Bundles configuration changes into packages + applies health-mediated progressive rollout + automated rollback by default. Teams "create these configuration units on demand" so any configuration pattern identified as dangerous (risk review, operational experience) can be brought into Snapstone and inherit safe deployment. Directly addresses the 2025-11-18 Bot Management feature-file channel and the 2025-12-05 internal WAF testing-tool disable-flag channel — both detonated on the global configuration system's rapid-fleet-wide-push. Snapstone is the strategic-system complement: rapid threat-response channel still available, but health-mediated deployment is now the default. Canonical wiki instance of patterns/config-deployment-as-code-deployment at the system tier.
-
systems/cloudflare-codex — Living repository of engineering standards (introduced publicly 2026-05-01). Rules authored by domain experts through the RFC process in the format "If you need X, use Y" with link back to the RFC; enforced via AI code review on every MR across the entire codebase without exception. Two named rules at launch: "Do not use
.unwrap()outside of tests andbuild.rs" (addresses concepts/unhandled-rust-panic, would have rejected the 2025-11-18 FL2 MR) and "Services MUST validate that upstream dependencies are in an expected state before processing" (addresses concepts/internally-generated-untrusted-input, would have rejected both 2025-11-18 and 2025-12-05 MRs). The flywheel: "expertise becomes standards, standards become enforcement, enforcement raises the floor for everyone." Canonical wiki instance of concepts/rfc-as-codified-engineering-rule and patterns/codex-enforced-via-ai-code-review; institutional-memory-backed shift-left from "global outage" to "rejected merge request." -
systems/cloudflare-agent-memory — Managed agent memory service (private beta 2026-04-17). Extracts information from agent conversations at compaction time, stores as classified memories (facts / events / instructions / tasks) outside the context window, retrieves via a five-channel parallel fusion pipeline (RRF over FTS-Porter + fact-key-exact + raw-message-FTS + direct-vector + HyDE-vector). Per-profile Durable Object (SQLite + FTS + supersession chains) + per-profile Vectorize index + Workers AI models (Llama 4 Scout for extraction / verification / classification / query analysis, Nemotron 3 for synthesis). Six-operation deliberately narrow API (
getProfile/ingest/remember/recall/forget/list) — the model never burns context on storage strategy. Content-addressed message IDs (SHA-256(sessionId + role + content)[:128 bits]+INSERT OR IGNORE) make re-ingestion idempotent; facts + instructions are keyed + superseded via forward pointer. Extracted memories are vectorised asynchronously with 3-5 classifier-generated search queries prepended to the embedding text, bridging declarative writes / interrogative reads. Session-affinity header routes to memory-profile name for prompt-caching benefit. Positioned as distinct from — and designed to work together with — AI Search: AI Search's substrate is files, Agent Memory's substrate is conversations. Internal dogfood on three workloads: coding-agent memory via OpenCode plugin, agentic code reviewer ("arguably the most useful thing it learned to do was stay quiet"), message-history chat bot. Every memory exportable — vendor-lock-in reduction explicitly named as trust-earning strategic posture. Fifth substrate in the 2026-04 Cloudflare agent-memory stack alongside Project Think Persistent Sessions (episodic), AI Search per-customer instances (semantic), Artifacts per-session Git repos (filesystem + session-history), Email Service thread-plus-DO-state. Canonical wiki instance of the following new patterns: patterns/constrained-memory-api, patterns/multi-stage-extraction-pipeline, patterns/parallel-retrieval-fusion, patterns/agent-driven-benchmark-loop, plus memory-tier instance of patterns/agent-first-storage-primitive / patterns/tool-surface-minimization / patterns/central-proxy-choke-point. (Source: sources/2026-04-17-cloudflare-agents-that-remember-introducing-agent-memory) -
systems/cloudflare-fl1-proxy / systems/cloudflare-fl2-proxy / systems/cloudflare-rulesets-engine / systems/cloudflare-managed-ruleset — Edge proxy + rulesets cluster at the heart of both the 2025-11-18 and 2025-12-05 outages. FL1 is the Lua-on-nginx/OpenResty legacy edge proxy; FL2 is the Rust-based successor. The rulesets engine (filter + action rules with an
executeaction that triggers sub-ruleset evaluation) runs in both generations — Lua in FL1, Rust in FL2. The killswitch subsystem inside the rulesets engine disables misbehaving rules rapidly via the global config system. On 2025-12-05, a seven-year-old dormant Lua nil-index bug (concepts/nil-index-lua-bug) in FL1's killswitch post-processing path detonated the first time the killswitch was applied to anaction=executerule — HTTP 500 for ~28% of Cloudflare traffic for 25 minutes; FL2 customers were unaffected because Rust's type system structurally prevents the bug class (canonical wiki instance). The Cloudflare Managed Ruleset is the customer-facing deployed configuration that, combined with FL1 traffic, defined the ~28% affected intersection. Running two proxy generations side-by-side is Cloudflare's long-shape dual-system migration; legacy FL1 carries the dormant hazards while FL2 has moved past them, and the stated remediation is to accelerate the migration rather than harden the legacy surface. - systems/cloudflare-bot-management / systems/perplexity-ai /
systems/perplexitybot / systems/perplexity-user /
systems/chatgpt-user — AI-crawler enforcement cluster
documented in the 2025-08-04 Perplexity stealth-crawler post.
Cloudflare's Bot Management scores every request; the
Verified Bots directory is the known-good allowlist.
Cloudflare de-listed Perplexity from Verified Bots
(2025-08-04) after confirming via a
brand-new-domain
experiment that Perplexity runs an undeclared
stealth crawler with generic
Chrome UA, unlisted IPs, and rotating ASNs — bypassing
customer blocks of its declared crawlers. ML + network-
signal fingerprinting produces a stealth signature that
survives rotation; block signatures shipped to the managed
AI-bots ruleset for all customers including free tier.
ChatGPT-User is the positive
control in the same experiment: fetched
robots.txt, honoredDisallow, stopped, no alternate-UA follow-up; ChatGPT Agent signs via Web Bot Auth. Canonical instance of the two-sided verified-bots program: published-identity requirement (patterns/signed-bot-request) + origin- directive-honoring requirement (concepts/robots-txt-compliance), with patterns/verified-bot-delisting as the enforcement lever. - systems/privacy-pass / systems/arc-anonymous-rate-limit-credentials / systems/act-anonymous-credit-tokens — IETF-standardized anonymous-credential suite (RFC 9576 architecture; RFC 9578 issuance; ARC + ACT IETF drafts as successors). Cloudflare has deployed base Privacy Pass since 2019 and runs it at billions of tokens per day, primarily via iCloud Private Relay. The 2026-04-21 "Moving past bots vs. humans" post positions the suite as the cryptographic substrate for web protection that is anonymous + accountable + decentralized — the third corner of the rate-limit trilemma — preserving the open Web against a drift toward required-login walls and device-attestation-only access. Cloudflare is co-developing both ARC (rate-limit extension) and ACT (credit-token / multi-presentation extension) in the IETF Privacy Pass WG. Complementary to Web Bot Auth as the anonymous branch of the post-bot-vs-human architecture.
- systems/cloudflare-ai-code-review / systems/opencode —
Cloudflare's CI-native AI code-review orchestration system
(2026-04-20). GitLab CI component that spawns
OpenCode as a child-process coordinator which fans out to up
to seven specialised sub-reviewers (security, performance,
code quality, documentation, release, AGENTS.md, engineering-
codex). Plugin architecture —
@opencode-reviewer/gitlab+cloudflare+codex+braintrust+agents-md+reviewer-config+telemetry+local. Coordinator runs on Opus 4.7 / GPT-5.4 (reserved for judge pass); sub-reviewers on Sonnet 4.6 / GPT-5.3 Codex for heavy-lifting domains and Kimi K2.5 on Workers AI for text-heavy domains. Risk tiering (trivial / lite / full) scales agent count to diff size + security-path sensitivity; trivial avg $0.20, full avg $1.68. Hystrix-style circuit breaker per model tier + failback chains (opus-4-7 → opus-4-6; sonnet-4-6 → sonnet-4-5) + coordinator-level failback via hot-swap ofopencode.json. KV-backed Worker for live re-routing in 5 seconds. Prompt-boundary sanitization strips<mr_body>/<mr_details>/etc. from user-controlled MR content. Shared-context file + per-file patches on disk → 85.7% prompt-cache hit rate. "What NOT to Flag" prompts named as the primary signal-to-noise lever. "Model is thinking..." heartbeat every 30s.break glasshuman override tracked in telemetry (288 / 0.6% of MRs in first 30 days). Incremental re-review with awareness of prior DiffNotes + user replies -
won't fix/acknowledged/I disagreestate-transition rules. Production scale (first 30 days): 131,246 review runs, 48,095 MRs, 5,169 repos, median 3m 39s, median cost $0.98, ~120 B tokens, 159,103 findings. 45+ upstream OpenCode PRs. Canonical wiki instance of the full patterns/coordinator-sub-reviewer-orchestration + patterns/specialized-reviewer-agents + patterns/ai-review-risk-tiering + patterns/jsonl-streaming-child-process + patterns/incremental-ai-rereview cluster. -
systems/unweight / systems/unweight-kernels — Cloudflare's lossless LLM weight compression for H100 inference (2026-04-17). Huffman coding on the redundant BF16 exponent byte + a custom reconstructive matmul kernel that loads compressed bytes from HBM, reconstructs BF16 in SMEM, and feeds tensor cores directly — the uncompressed representation never touches HBM. Four execution pipelines (full decode / exponent-only / palette transcode / direct palette) autotuned per
(weight matrix, batch size)via patterns/autotuned-execution-pipeline-selection. Llama-3.1-8B: ~22 % model-size reduction (distribution), ~13 % (inference, gate + up only), ~3 GB VRAM saved per instance. Throughput overhead 30-40 % at current optimization level, narrows at larger batches. Open-source kernels at github.com/cloudflareresearch/unweight-kernels; technical paper. Complement to Infire's activation-memory discipline — Unweight on weights, Infire on activations, savings additive into KV-cache headroom. Canonical wiki instance of fused decompression + matmul + lossless weight compression + SM-partitioned producer / consumer kernel split. -
systems/cloudflare-shared-dictionaries — RFC 9842 shared-dictionary compression on the edge: the previously cached version of a resource becomes the compression dictionary for the next version. Three-phase rollout — Phase 1 passthrough (open beta 2026-04-30; forward
Use-As-Dictionary+Available-Dictionaryheaders +dcb/dczencodings, extend cache keys to vary onAvailable-Dictionary+Accept-Encoding), Phase 2 managed (customer names which assets are dictionaries; CF stores bytes + compresses + serves variants), Phase 3 automatic (CF auto-detects versioned-URL patterns, generates dictionaries, validates via RUM beacon before serving). Lab numbers: 272 KB JS bundle → 2.6 KB DCZ (97 % over gzip); canicompress.com demo: 94 KB SPA → ~159 bytes DCZ (99.5 %). Canonical wiki instance of edge- managed protocol complexity + [[patterns/phased-cdn- rollout-passthrough-managed-auto|passthrough→managed→auto CDN rollout]]. Successor to Google's 2008-2017 SDCH which was un-shipped from Chrome in 2017 after CRIME/BREACH - SOP violations; RFC 9842 closes the gaps with a same-origin dictionary-scope constraint.
- systems/cloudflare-email-service — bidirectional email
primitive on the developer platform: Email Routing (free,
inbound, long-available) + Email Sending (public beta from
2026-04-16, outbound). Native Workers
EMAILbinding (env.EMAIL.send({...})) alongside REST API + TS/Python/Go SDKs,wrangler email sendCLI, and a new Email MCP server on the Cloudflare MCP surface. SPF / DKIM / DMARC auto-configured on domain attach. Integrates with the Agents SDK viaonEmail+sendEmail+routeAgentEmail+createAddressBasedEmailResolver— address-based routing mapssupport@domain/support+ticket-123@domainto distinct DO instances (concepts/address-based-agent-routing, patterns/sub-addressed-agent-instance); HMAC-SHA256-signedinReplyToheaders route replies back to the exact originating DO instance (patterns/signed-reply-routing-header). Canonical wiki instance of concepts/email-as-agent-interface - concepts/asynchronous-reply-email + patterns/inbound-classify-persist-reply-pipeline.
- systems/agentic-inbox — open-source reference email client-plus-agent (github.com/cloudflare/agentic-inbox) stitching Email Routing inbound + Email Sending outbound + Workers AI classification + R2 attachments + Agents SDK stateful logic; built-in MCP server lets external agents draft emails for human review before send (email-tier analog of Agent Lee's elicitation gate). Fifth instance of the 2026-04 "dogfood the platform as a customer-facing product" shape after Agent Lee, Project Think, AI Search, and Artifacts.
- systems/cloudflare-artifacts — distributed versioned filesystem that speaks Git (private beta 2026-04-16, public beta early May 2026). One Durable Object per repo hosts a ~100 KB pure-Zig Wasm Git server + embedded SQLite (Git objects chunked across 2 MB rows) + raw-delta-alongside- resolved-object storage; pack-file snapshots in R2; auth tokens in KV; Worker as stateless front-end. Canonical wiki instance of patterns/git-protocol-as-api, patterns/do-backed-git-server, concepts/agent-first-storage-primitive, concepts/wasm-git-server, concepts/repo-per-agent-session (Cloudflare uses Artifacts internally to persist per-session filesystem + prompt history). $0.15/1 k ops + $0.50/GB-month, 10 k ops + 1 GB free.
- systems/artifact-fs — sibling open-source FS driver (github.com/cloudflare/artifact-fs) that mounts any Git remote with blobless-clone + agent-aware-priority background hydration. "Git clone but async." Works with any Git remote, not Artifacts-exclusive. Canonical wiki instance of concepts/async-clone-hydration / patterns/blobless-clone-lazy-hydrate.
- systems/cloudflare-ai-search — managed hybrid (BM25 +
vector) search primitive for AI agents, rebranded from AutoRAG
(2026-04-16). Runtime-provisioned
ai_search_namespacesbinding:create()/delete()/list()/search()at the namespace level, one instance per customer / agent / language, built-in R2 storage + Vectorize index. Canonical wiki instance of patterns/runtime-provisioned-per-tenant-search-index, patterns/cross-index-unified-retrieval, patterns/metadata-boost-at-query-time, and patterns/upload-then-poll-indexing. Cloudflare's own blog search dogfoods it. - systems/cloudflare-agents-sdk — SDK +
AIChatAgent/Agentbase classes +routeAgentRequesthelper for building agents on Workers + Durable Objects. Positioned as the predecessor of Project Think ("the next generation of the Agents SDK") and the host of the 2026-04-16 AI Search support-agent worked example. - systems/workers-ai — managed LLM + embedding + reranker
platform on the
@cf/…namespace. Hosts Kimi K2.5, BGE reranker base, embedding models. Billed separately from AI Search during AI Search's open beta. Serving architecture (2026-04-16 deep dive): PD-disaggregated prefill + decode tiers with a custom token-aware load balancer that rewrites SSE responses across the stage boundary; client- signalledx-session-affinityheader for cross-cluster warm-cache routing (peak input-cache hit 60% → 80%); cluster-wide KV sharing over RDMA ( Mooncake Transfer Engine + Mooncake Store + LMCache / SGLang HiCache); speculative decoding with NVIDIA EAGLE-3 drafter on Kimi K2.5; Infire proprietary Rust inference engine with multi-GPU tensor + pipeline - expert parallelism. Measured: p90 TTFT dropped, p90 intertoken latency ~100 ms → 20-30 ms (3×), same GPU count with higher request volume.
- systems/infire — Cloudflare's proprietary Rust inference engine (Birthday Week 2025 launch, 2026-04 multi-GPU extension). Runs on both PD tiers of Workers AI. Lower activation-memory overhead than vLLM — fits Llama 4 Scout on 2× H200 with >56 GiB KV room (~1.2M tokens); fits Kimi K2.5 on 8× H100 (not H200) with >30 GiB KV room. Sub-20s cold boot, +20% tokens/sec vs baseline on unconstrained systems. "In both cases you would have trouble even booting vLLM in the first place."
- systems/mooncake-transfer-engine — Moonshot AI's (kvcache-ai/Mooncake) RDMA KV-transfer framework consumed externally by Workers AI for intra-node (NVLink) + inter-node (NVMe-oF) KV block movement; companion to systems/mooncake-store (NVMe cache tier). Same origin as Kimi K2.5 — Moonshot's serving infrastructure ships with the model.
- systems/mooncake-store — Moonshot's NVMe cold tier for KV
cache; extends session residency beyond HBM + DRAM so
x-session-affinityhits survive long idle windows. - systems/lmcache — open-source cluster-wide KV-cache software layer paired with Mooncake; exposes shared cache to the serving engine so different replicas can reuse the same pre-filled prefix.
- systems/eagle-3 — NVIDIA's speculative-decoding drafter
model, target-model-specific; Workers AI uses
nvidia/Kimi-K2.5-Thinking-Eagle3. Shines on agentic tool-call - JSON-structured-output workloads.
- systems/cloudflare-vectorize — managed vector database. Pre-AI-Search a customer-visible primitive; for the AI Search workflow, an implementation detail inside every instance.
- systems/kimi-k2-5 — Moonshot AI's open-weight LLM served as
@cf/moonshotai/kimi-k2.5; chat model in the 2026-04-16 AI Search support-agent example. - systems/replicate-cog — Replicate's
open-source ML-model containerisation format
(cog.run) —
cog.yaml+predict.py:Predictor cog build. As of 2026-04-16, the Workers AI BYO-model substrate: customers package a fine-tuned or custom model as a Cog container, push it to Workers AI, and it surfaces in the AI Gateway unified catalog alongside@cf/…+ third-party models. Currently Enterprise- design-partner access; roadmap includes customer-facing
push APIs,
wranglercommands, and GPU-snapshotting-based faster cold starts. The Replicate team has officially joined the Cloudflare AI Platform team. Canonical wiki instance of patterns/byo-model-via-container. - systems/project-think — "Next generation of the Agents
SDK" (
@cloudflare/think, 2026-04-15 preview). Six primitives (durable fibers, sub-agents via Facets, tree-structured Sessions with FTS5 search, sandboxed code execution, execution ladder Tiers 0-4, self- authored extensions) + the opinionatedThinkbase class that wires them together. Built on the same customer-visible primitives (DO + Workers + R2 + Browser Rendering + Sandbox SDK). Internal production use: "we're already using it internally to build our own background agent infrastructure." Canonical wiki instance of concepts/one-to-one-agent-instance, concepts/durable-execution, concepts/capability-based-sandbox, concepts/execution-ladder, and concepts/self-authored-extension. - systems/dynamic-workers — per-request V8-isolate sandbox
that starts in milliseconds with a few MB of memory —
~100× faster, up to 100× more memory-efficient than a
container. Capability-model posture by default: "almost no
ambient authority (
globalOutbound: null); developer grants capabilities explicitly through bindings." The substrate at Tiers 1-3 of Project Think's execution ladder; the execution tier for every Code Mode generated program. - systems/emdash — Open-source TypeScript CMS (v0.1.0 preview, 2026-04-01), positioned as "the spiritual successor to WordPress." Each plugin runs as a Dynamic Worker with a capability manifest declaring exactly which hooks and capabilities it needs — applies the same capability-based sandbox primitive Project Think uses for LLM-generated code to third-party CMS plugins. Plugins can have any license; plugin code can be "trusted without being seen" (patterns/license-independent-sandboxed-plugins). Breaks the marketplace-lock-in dynamic that WordPress's 800+-plugin-review-queue exemplifies. Astro-themed; workerd scale-to-zero hosting (or any Node.js server); built-in x402 / HTTP 402 for per-request content monetisation; built-in MCP server + in-tree Agent Skills + passkey-default auth. First wiki instance of MCP as a CMS primitive and Agent Skills shipped inside a product. MIT licensed; clean reimplementation of WordPress functionality ("no WordPress code was used"), sibling shape to vinext's Next.js clean reimplementation.
- systems/wordpress — the structural counter-example. 40%+ of the web; 24-year-old architecture predating modern sandboxing + serverless primitives; canonical instance of the plugin-marketplace- lock-in dynamic Cloudflare's EmDash argues against.
- systems/workerd — Cloudflare's open-source V8-isolate Workers runtime (Apache 2.0; C++ + V8 + Cap'n Proto). Portable runtime behind Cloudflare Workers; the scale-to-zero isolate substrate EmDash is built to run on anywhere ("EmDash is different: it's built to run on serverless platforms, and make the most out of the v8 isolate architecture of Cloudflare's open source runtime workerd").
- systems/x402-protocol — Open Internet-native payment standard on top of HTTP 402, governed by the x402 Foundation (Cloudflare + Coinbase co-founded). Built into every EmDash site out of the box. Evolves the pay-per-crawl 402 primitive from single-vendor header set → open standard → CMS primitive.
- systems/agent-lee — first-party in-dashboard AI assistant (beta, 2026-04-15); ~18K DAU / ~250K tool calls/day at launch. Built on Agents SDK + Workers AI + Durable Objects + MCP. Canonical wiki instance of Code Mode as an agent consumption layer + credentialed proxy + elicitation gate + dynamic UI generation on an adaptive grid.
- systems/code-mode — Cloudflare's pattern of exposing MCP tools as a TypeScript API and asking the model to write code against it, not pick tools. Fits ~3,000 Cloudflare API operations into <1,000 context tokens; the core consumption layer behind Agent Lee and Cloudflare's internal MCP Server Portal.
- systems/pingora — Cloudflare's in-house Rust-based HTTP proxy framework and the heart of their Rust proxy services; open-sourced
- Backbone of pingora-origin and other edge services.
- systems/cf-cli — next-generation
Wrangler CLI (Technical Preview, 2026-04-13):
npx cf/npm install -g cf. Single CLI for all Cloudflare products (~100 products, ~3,000 HTTP API operations), generated from a unified TypeScript schema together with Workers bindings, SDKs, Terraform provider, MCP Code Mode server, Agent Skills, andwrangler.jsoncconfiguration. Convention rules (getnotinfo,--jsoneverywhere) enforced at the schema layer. - systems/wrangler-cli — the existing Workers CLI being evolved
into
cf; hostswrangler devlocal loop, deploy, bindings management,wrangler.jsoncconfiguration. - systems/cloudflare-local-explorer — 2026-04-13 open-beta
introspection GUI + local mirror of the Cloudflare API at
/cdn-cgi/explorer/apion any Wrangler or Vite-plugin-powered app; unifies local / remote resource management for agents. See patterns/local-mirror-of-remote-api. - systems/miniflare — Cloudflare's local-dev emulator running the same V8 Workers runtime as production + local SQLite / on-disk state for every binding (KV, R2, D1, Durable Objects, Workflows). Local Explorer is the introspection layer on top of Miniflare's state.
- systems/cloudflare-d1 / systems/cloudflare-kv / systems/cloudflare-durable-objects / systems/cloudflare-workflows — developer-platform storage and orchestration primitives that bind to Workers and are now locally introspectable through Local Explorer.
- systems/cloudflare-vite-plugin — the companion to Wrangler for Vite-powered projects; also hosts Local Explorer's endpoint.
- systems/pingora-origin — the pingora-based service that performs the last outbound hop to customer origins: strips internal headers, enforces routing / measurement / optimization, then forwards. Runs at 35 M req/sec on ~40,000 saturated CPU cores globally (2024).
- systems/trie-hard — open-source Rust crate (2024-09-10) implementing a trie optimized for high-throughput read-heavy membership checks over a static key set: node relationships packed into the bits of unsigned integers, whole tree held in a contiguous memory chunk. Production-deployed since July 2024 to strip Cloudflare internal headers in pingora-origin.
- systems/criterion-rust — the Rust statistical microbenchmark crate Cloudflare uses for per-function perf measurement; nanosecond resolution, regression tracking.
- systems/cloudflare-ai-gateway — Cloudflare's LLM routing / caching / logging gateway; core of the internal AI stack + the provider- swap point in Moltworker.
- systems/workers-ai — Cloudflare's same-network model-inference platform (runs alongside Workers / DO / KV / D1).
- systems/mcp-server-portal — Cloudflare's MCP server aggregation layer with Code Mode tool collapsing.
- systems/backstage — self-hosted OSS service catalog exposed via MCP to internal agents; Cloudflare runs 2,055 services, 228 APIs, 1,302 DBs, 544 systems across 45 domains in it.
- systems/opencode — terminal-based agentic coding client with
.well-known-driven config bootstrap. - systems/agents-sdk — Cloudflare's Agents SDK (McpAgent etc.).
- systems/cloudflare-sandbox-sdk — higher-level SDK for driving Cloudflare Containers from Workers; Moltworker + Dynamic Workers sandboxed agent-generated code exec.
- systems/cloudflare-containers — the ephemeral-by-design Docker- container runtime Sandbox SDK wraps. DO-enabled Containers (open beta 2025) gained Browser Run as a Customer-Zero workload via its 2026-05-13 migration off shared BISO infra, which surfaced + drove fixes for "light on documentation, light on observability, light on colleagues in an overlapping timezone" platform-interface gaps and canonicalised DO-Container cross-region placement decoupling + regional pre-warmed DO+Container-pair pools as a substrate-level pattern.
- systems/cloudflare-browser-rendering — managed headless-Chromium tier with Puppeteer / Playwright / Stagehand / MCP adapters. 2026-04-16 rebranded from "Browser Rendering" to "Browser Run". 2026-05-13 migrated off shared BISO infrastructure onto Cloudflare Containers — 60 browsers/min via Workers binding, 120 concurrent (4× previous), >50% Quick Action latency reduction, WebGL + WebMCP unblocked.
- systems/cloudflare-browser-isolation — 2026-05-13 first canonical wiki entry for Cloudflare's remote-browser Zero Trust product (BISO). The page documents BISO primarily as the prior shared substrate that Browser Run migrated off, with the three workload-shape mismatches (image size / POP distribution / long-steady-vs-short-spiky session shape) canonicalised as a "shared infra is cheaper until your workload shape diverges" failure mode.
- systems/cloudflare-queues — 2026-05-13 first canonical
wiki entry for Cloudflare's Workers-platform messaging
primitive. Browser Run's container-state-tracking pipeline is
the canonical instance: per-container 5-second state-update
writes go through a per-region Queue
(
production-core-containers-queue-weuretc.) atmax_batch_size: 100, max_batch_timeout: 1to amortise D1's per-row write cost — D1 batch-write P95 0.1 ms lifts the per-location ceiling from 5,000 → 500,000 containers (100× headroom) at <2-second steady-state queue lag. Canonical Cloudflare-platform substrate for patterns/queue-batching-amortizes-db-write-throughput + patterns/region-fallback-on-queue-backlog. - systems/cloudflare-r2 — S3-compatible object storage with
egress-free pricing;
sandbox.mountBucket()presents R2 as a filesystem partition to ephemeral containers. - systems/cloudflare-zero-trust-access — identity-aware application proxy gating internal + Developer-Platform applications; JWT-signed requests to origin Workers.
- systems/moltbot — open-source self-hosted personal AI agent (formerly Clawdbot, renamed OpenClaw 2026-01-30); the reference workload ported to Cloudflare via Moltworker.
- systems/moltworker — Cloudflare's open-source proof-of-concept port of Moltbot onto the Developer Platform (Workers + Sandbox SDK + Browser Rendering + AI Gateway + R2 + Zero Trust Access).
- systems/chrome-devtools-protocol — the browser-automation wire protocol Browser Rendering exposes; the proxy point in Moltworker's headless-browser integration.
- systems/cloudflare-workflows — Cloudflare Workflows (durable execution). 2026-05-01 disclosed Workflows V2 capacity: 50k concurrent instances / 300 new-per-second per account.
- systems/dynamic-workers — Cloudflare's per-request isolate- spawning Workers tier for agent-generated code.
- systems/cloudflare-dynamic-workflows — 2026-05-01 library
that bridges Workflows with Dynamic Workers to let each tenant
ship their own
run(event, step)class dispatched on demand. Canonical wiki instance of patterns/dynamic-binding-over-static-binding at the durable-execution layer and concepts/per-tenant-dynamic-code-dispatch. - systems/durable-object-facets — 2026-04 dynamic-binding counterpart to Durable Objects (per-tenant SQLite database spun up on demand with the platform as supervisor). Storage- tier sibling to Dynamic Workers (compute) and Dynamic Workflows (durable execution).
Internal data / dashboard infrastructure (2025-11-18 disclosure)¶
- systems/clickhouse — Cloudflare's internal ClickHouse
cluster hosts Bot Management telemetry + feature data. Sharded
across nodes;
default-database distributed tables fan out tor0.*shard tables. The 2025-11-18 incident disclosed the cluster's role and the per-user-distributed-query migration (explicitr0.*grants so distributed subqueries run under the initiating user's account) that silently doubledsystem.columnsquery results for the Bot Management feature-file generator. Canonical wiki instance of concepts/clickhouse-distributed-query and concepts/database-permission-migration-risk. - systems/turnstile — Cloudflare's privacy-preserving challenge widget (CAPTCHA replacement). Embedded on customer login flows + Cloudflare's own Dashboard login. 2025-11-18 disclosure: Turnstile depends on the core proxy; when the proxy returned 5xx, Turnstile couldn't load, and every dashboard login that required a new auth failed.
- systems/cloudflare-dashboard — the web control plane. 2025-11-18 disclosure: two impact windows (11:30–13:10 via Workers KV dependency; 14:40–15:30 via login-retry backlog after Turnstile recovery). Existing sessions unaffected.
Managed relational storage (partner-integrated)¶
- systems/hyperdrive — Cloudflare's database connectivity
service for Workers talking to
central SQL databases. Manages connection pools + query
caching in front of any Postgres / MySQL origin; exposed to
customer code as a first-class Workers binding
(
hyperdrive: [{ binding: "DATABASE", id }]) so a standardpgor MySQL client works unchanged overenv.DATABASE.connectionString. Primary beneficiary: SQL-heavy request paths from the edge to a central DB, where naive per-invocation connections would exhaust the DB's connection budget and pay TLS + handshake cost every request. - systems/planetscale — partner-integrated managed
relational database offering both Postgres and MySQL
(Vitess) engines, announced as a Cloudflare partnership in
September 2025. As of 2026-04-16 customers can provision
PlanetScale databases directly from the Cloudflare dashboard
/ API and (from "next month") be billed through their
Cloudflare account — including redemption of Cloudflare
credits (startup programme, committed spend) against
PlanetScale usage. Full PlanetScale feature + SKU + pricing
surface preserved (query insights, AI-agent tooling, database
branching, $5/month starter node). Canonical wiki instance of
patterns/partner-managed-service-as-native-binding +
storage-tier instance of
patterns/unified-billing-across-providers. Worker
co-location with the DB region via
placement hints
(
placement: { region: "aws:us-east-1" }) resolves concepts/edge-to-origin-database-latency — with auto-placement named as forward-looking ("single digit milliseconds" target).
Security / identity (post-BastionZero)¶
- systems/opkssh — open-source SSH tool that uses SSO (OpenID Connect) to issue ephemeral 24h SSH keys. Originally built by BastionZero; Cloudflare acquired the company in 2024 and donated the code to the OpenPubkey Linux Foundation project (Apache 2.0, 2025-03-25). No protocol or binary changes required to OpenSSH.
- systems/openpubkey — the underlying protocol (Linux Foundation, 2023) adding public keys to OIDC ID Tokens, producing PK Tokens. OPKSSH is the production-complete application of OpenPubkey to SSH.
- systems/bastionzero — the zero-trust infrastructure-access company Cloudflare acquired in 2024; historical origin of OPKSSH.
Network defence / DDoS mitigation¶
- systems/magic-transit — Cloudflare's IP-network-level DDoS protection product: customer prefixes advertised via anycast from the full POP fleet; L3 scrubbing on every server; GRE tunnel to origin. Defended the 7.3 Tbps / 4.8 Bpps / 37.4 TB in 45 seconds attack of May 2025 — largest DDoS attack ever reported at the time — fully autonomously across 477 data centres / 293 locations.
- systems/dosd — Cloudflare's per-server user-space denial-of-service daemon: reads packet samples from XDP/eBPF, enumerates fingerprint permutations, picks the most-selective match via a data-streaming algorithm, compiles the fingerprint back to an eBPF program that drops matching packets in XDP; top fingerprints gossiped within-POP and globally. Exposed to customers as the DDoS Protection Managed Rulesets.
- systems/ebpf — not Cloudflare-owned, but XDP + eBPF is the kernel drop plane for this stack; the 2025-06-20 writeup is the first wiki instance of XDP as a DDoS data plane.
Public DNS & anycast reliability¶
- systems/cloudflare-1-1-1-1-resolver — Cloudflare's free
public DNS Resolver (2018); global anycast of prefixes
1.1.1.0/24,1.0.0.0/24,2606:4700:4700::/48and related over UDP / TCP / DoT. DoH viacloudflare-dns.comuses a different IP set — a load-bearing availability-architecture detail exposed by the 2025-07-14 incident, where DoH-via-hostname traffic stayed mostly up while bare-IP UDP/TCP/DoT traffic to 1.1.1.1 was globally unreachable for 62 minutes. - systems/cloudflare-data-localization-suite (DLS) — the Cloudflare product line that constrains services to customer- selected regions via a per-service service topology (explicit list of POPs the service is allowed to run in). 2025-07-14: a 38-day dormant-misconfig on a not-yet-live DLS service topology was the surface on which the 1.1.1.1 outage detonated.
BGP / routing forensics (Venezuela route-leak post)¶
- systems/aspa — Autonomous System Provider
Authorization (IETF SIDROPS draft). RPKI-signed objects
declaring each AS's authorized upstream providers; enables
path validation to detect and
reject route leaks at any BGP
vantage point. Tier-1 networks use the
AS0member to declare "no upstream providers." Cloudflare's forward-looking mitigation pick in the 2026-01-08 Venezuela post; would have prevented the 01-02 Type-1 hairpin leak. - systems/peerlock / Peerlock-lite — operator-side BGP-filter recipes (Junos + NANOG) that reject customer- learned routes containing known Tier-1 ASNs in the path. Near-term leak mitigation before full ASPA adoption.
- systems/cantv-as8048 (Venezuela, state ISP) / systems/sparkle-as6762 (Italy, Tier-1) / systems/vtal-globenet-as52320 (Colombia) / systems/dayco-telecom-as21980 (Venezuela) — the four ASes in the 2026-01-02 leak event Cloudflare dissects. AS8048 is the leaker (hygiene failure); AS6762 is the Tier-1 upstream the routes came from; AS52320 is the other provider they were leaked to; AS21980 is the customer whose prefixes were the subject matter.
Internet-disruption observability (Radar Q1 2026 review)¶
- systems/cloudflare-radar-outage-center — the curated, human-attributed catalogue of confirmed Internet disruptions that sits on top of the raw per-country / per-AS / per-region traffic anomaly feed. Published at radar.cloudflare.com/outage-center. Substrate for the quarterly Internet-disruption review blog-post cadence.
- systems/cloudflare-cloud-observatory — per-cloud-provider per-region connection failure rate published at radar.cloudflare.com/cloud-observatory. Third-party external-callers' uptime signal for hyperscaler regions, complementary to the providers' own status dashboards. Canonical use: live forensics of the March 2026 drone strikes on systems/amazon-me-central-1 + systems/amazon-me-south-1.
- systems/uganda-internet-exchange-point — national peering exchange whose domestic-throughput collapse from ~72 Gbps to ~1 Gbps during the January 2026 pre-election shutdown is the canonical wiki instance of IXP throughput as shutdown-scale signal.
- systems/asiatech-as43754 / systems/rasana-as31549 — the two largest IPv6-announcing Iranian ASes; between them lost ~18% of Iran's IPv6 address space (9.4% + 8.8%) in the hours before the January 8 2026 shutdown — canonical instance of concepts/ipv6-withdrawal-as-shutdown-signal. As of late May 2026 (per the 2026-05-27 follow-up), the IPv6 withdrawal remains effectively in place ~5 months later — IPv4 traffic recovery and IPv6 posture have decoupled, establishing the persistent IPv6 withdrawal pattern.
- systems/tci-iran / systems/irancell / systems/rightel / systems/mcci — the four named Iranian operators whose simultaneous traffic increases on May 26 2026 confirmed the country-wide partial restoration of Iran's February 28 nationwide shutdown (87 days into the disruption). Per-ASN attribution is the standard Cloudflare Radar primitive for distinguishing state-coordinated from per-carrier policy events.
- systems/wacs-cable — West Africa Cable System; the 2026-01-02 cable incident produced 82% below expected traffic from the Republic of Congo for ~63 hours — canonical instance of concepts/submarine-cable-incident.
- systems/aws-health-dashboard — AWS's first-party service-health disclosure. The 2026-03 me-central-1 + me-south-1 event post provided the explicit "structural damage, disrupted power delivery... fire suppression... water damage" framing + the "back up their data or migrate to other AWS regions" customer advisory that anchors patterns/cloud-region-migration-during-conflict.
Workers / V8 / Node.js performance stack¶
- systems/cloudflare-workers — V8-isolate-based serverless compute; 330+ POPs; CPU-time billing (not wall-clock); 128 MB default isolate memory. In 2025-10 Cloudflare re-tuned two knobs: concepts/warm-isolate-routing (biases CPU-bound bursts toward new isolates while keeping I/O-bound coalescing) and concepts/v8-young-generation (removed a stale 2017-era young-gen cap → ~25 % benchmark improvement globally, small memory increase).
- systems/opennext — OSS Next.js portability adapter layer
(Workers, AWS Lambda, Netlify, …). 2025-10 profiling surfaced
allocation / copy / buffering inefficiencies (50× unused
BufferperpipeThrough, per-chunk output copy,Buffer.concat(chunks).lengthdiscard pattern, Node↔Web-streams double-buffer, value-orientedReadableStreamwithhighWaterMark: 1); Cloudflare shipped upstream PRs. - systems/v8-javascript-engine — Cloudflare employs multiple
V8 core contributors; the company ships ecosystem fixes as V8
patches that compound across Node.js / Chrome / Deno / Workers.
Canonical 2025-10 instance: ~33 %
JSON.parse(text, reviver)speedup (Chromium CL 7027411, ships V8 14.3 / Chrome 143). - systems/nextjs — target framework for OpenNext; profiled in the 2025-10 post (reviver-heavy paths, rendering-pipeline allocation patterns).
- systems/nodejs — sibling V8-embedder; governance-adjacent
via Snell's TSC seat; the runtime whose Web-streams
performance gap (Vercel's 12× measurement vs Node's own
pipeline()) motivated the 2026-02-27 critique. - systems/web-streams-api — the WHATWG Streams Standard; Workers-preferred streaming API. Canonical wiki instance of the 2026-02-27 structural critique (readers+locks, advisory-only backpressure, push-based TransformStream, promise-allocation cost, BYOB-without-payoff).
- systems/new-streams — Snell's 2026-02-27 proof-of-concept alternative streaming API (async-iterable + pull + explicit backpressure policies + batched chunks + sync fast paths); 2×-120× faster than Web streams in every tested runtime. Not a ship-it standard, explicitly a conversation-starter.
- systems/react — first-party consumer inside Next.js;
value-oriented
ReadableStreams + heavyJSON.parse(text, reviver)usage; default-dev-mode whenNODE_ENVunset was one of the 2025-10 methodology biases.
Post-quantum cryptography stack¶
- systems/cloudflare-universal-ssl — 2014 free-TLS-for-all program; the historical precedent and rhetorical anchor for Cloudflare's default-on / no-extra-cost posture that the 2026 PQ roadmap invokes explicitly ("Free TLS helped encrypt the web. Free post-quantum cryptography will help secure it for what comes next.").
- systems/cloudflare-magic-wan-ipsec — 2026-04-30 GA of hybrid ML-KEM for Cloudflare IPsec per draft-ietf-ipsecme-ikev2-mlkem. The third canonical default-on-security-upgrade instance after 2014 Universal SSL + 2022 TLS PQ-for-all; extends PQ-KEM coverage from web-TLS into enterprise site-to-site networking. Interop verified with Cisco 8000 Series 26.1.1+ + Fortinet FortiOS 7.6.6+ branch connectors; reference implementation via strongswan. Palo Alto Networks' RFC 9370-based early implementation does not yet interoperate — canonical ciphersuite-bloat case study.
- systems/cloudflare-one-sase — the SASE / zero-trust suite flagged as the Early-2028 milestone: fully PQ-authenticated Cloudflare One completes the enterprise-network side of the 2029 target. Already PQ-encrypted (2022 hybrid KEM rollout on tunnel traffic + 2026-04-30 IPsec PQ-KEM GA).
- systems/ikev2-protocol — the IPsec handshake layer receiving hybrid ML-KEM; IKE_INTERMEDIATE (RFC 9242) as the extension point carrying the PQ half of the hybrid.
- systems/ml-kem — NIST FIPS 203 PQ KEM algorithm (Kyber
lineage). Already running in TLS hybrid
X25519MLKEM768at65 % of human traffic; now also running in Cloudflare IPsec's IKEv2 hybrid per draft-ietf-ipsecme-ikev2-mlkem.
- systems/strongswan — open-source IPsec / IKEv2 reference implementation; test harness for Cloudflare's pre-GA closed- beta interoperability testing of hybrid ML-KEM.
- systems/cisco-8000-series-secure-routers — enterprise branch-connector hardware supporting the PQ IPsec draft from version 26.1.1.
- systems/fortinet-fortios — enterprise branch-connector OS supporting the PQ IPsec draft from version 7.6.6.
- systems/palo-alto-networks-ipsec — the canonical non- interoperable case: shipped under RFC 9370 before the named standard existed.
- systems/ml-dsa-signature — NIST FIPS 204 PQ signature algorithm (Dilithium lineage). Named as the Mid-2026 Cloudflare→origin authentication primitive; general-purpose lattice-based signature suited for TLS cert chains + code signing + federation.
- systems/merkle-tree-certificates — Mid-2027 milestone mechanism for visitor→Cloudflare PQ authentication. Batches PQ-signed cert commitments under a Merkle root signed once and distributed out-of-band; per-handshake cost reduced from a full PQ signature (15-50 kB cert chain) to logarithmic inclusion-proof material.
- systems/pq-hsts — browser-side downgrade-protection mechanism being worked on in Chromium; named as one of the two feasible-for-federated-systems answers to "attackers with CRQCs forge transcript signatures so negotiation alone doesn't defeat downgrade."
- systems/certificate-transparency — sibling downgrade- protection surface; same Merkle-tree-backed public-audit infrastructure as existing CT repurposed as evidence-that- PQ-certs-exist for downgrade detection; Bas Westerbaan (Cloudflare Research) authored the 2026 RWPQC framing.
Content monetization / verified bots¶
- systems/pay-per-crawl — 2025-07-01 private-beta feature
letting publishers monetize AI-crawler access per request:
Allow / Charge (flat per-request USD, delivered as
HTTP 402 +
crawler-price) / Block. Rules engine runs after existing WAF + bot management — monetization layered on top, never overriding a security decision. Cloudflare is the Merchant of Record aggregating N×M billing into N+M enrollments. - systems/web-bot-auth — Cloudflare's cryptographic bot-identity
proposal: Ed25519 keypair + hosted JWK directory + per-request
RFC 9421 HTTP Message
Signatures (
signature-agent+signature-input+signatureheaders, tagweb-bot-auth). Replaces IP allowlists / reverse-DNS as bot verification. Identity substrate pay-per-crawl requires before it can bill. - systems/cloudflare-waf — existing edge L7 firewall; relevant here only as the pipeline stage that runs before pay-per-crawl.
Agent-readiness tooling (2026-04-17)¶
- systems/isitagentready — Cloudflare's public Lighthouse-
style agent-readiness scanner; grades any URL on the
four-dimension rubric
(Agent Discovery / Content for LLMs / Access Rules / Agent
Actions) plus non-scoring Agentic Commerce. Self-
referential — publishes its own MCP server at
/.well-known/mcp.json+ Agent Skills index at/.well-known/agent-skills/index.json. Canonical wiki instance of patterns/score-driven-standard-adoption. - systems/cloudflare-url-scanner — existing public
URL-analysis product; 2026-04-17 gained an Agent
Readiness tab +
options: {"agentReadiness": true}API option embedding the same four-dimension rubric. - systems/cloudflare-developer-documentation —
developers.cloudflare.com
as the reference implementation of agent-ready
documentation. Five refinements: dynamic
/index.mdfallback via two Transform Rules; splitllms.txtper top-level product directory; hidden agent directives in HTML; Redirects for AI Training on deprecated docs (Wrangler v1, Workers Sites); afdocs-compliant sidebar with "LLM Resources" entries. Benchmark: 31 % fewer tokens + 66 % faster to correct answer vs average non-refined technical docs site (Kimi-k2.5 via OpenCode) — patterns/comparative-documentation-benchmark canonical instance. - systems/cloudflare-transform-rules — URL Rewrite Rule +
Request Header Transform Rule composition backing the
dynamic
/index.mdfallback without static content duplication. - systems/redirects-for-ai-training — classify AI-training crawlers + redirect them away from deprecated content; fixes future LLM answers at training-data time rather than inference time.
- systems/agent-skills — Cloudflare-authored RFC for an
/.well-known/agent-skills/index.jsonstandard listing agent-consumable skill documents. - systems/mcp-server-card — draft MCP #1649
standard for pre-connect MCP-server discovery at
/.well-known/mcp/server-card.json. - systems/api-catalog-rfc-9727 — IETF
RFC 9727
standard for
/.well-known/api-catalog; scored check under Agent Actions. -
systems/cloudflare-access — Cloudflare's zero-trust access product; 2026 Agents Week added full support for RFC 9728 OAuth Protected Resource Metadata so agents can drive humans through proper OAuth consent flows rather than reusing a logged-in browser session.
-
systems/capnweb — Cloudflare's pure-TypeScript object-capability RPC library (2025-09-22, open-sourced MIT, <10 kB minify+gzip, zero dependencies, written by Cap'n Proto author Kenton Varda). Spiritual sibling of Cap'n Proto that keeps the object-capability model and promise pipelining while dropping the
.capnpschema language in favour of TypeScript-interface-as-schema + a JSON-based human-readable wire format. Symmetric / [[concepts/ bidirectional-rpc|bidirectional]] (either peer can call the other), four message types (push/pull/resolve/reject), predictable positive export-table IDs enable pipelining-before-first-response. Three transports built in: HTTP batch, WebSocket,postMessage(). Runs in every major browser + Workers + Node.js. Novel record-replay DSL for.map()over promised arrays — the "DSL" turns out to be the RPC protocol itself. First production consumer: Wrangler remote bindings (GA 2025-09-16) so localworkerdtest instances can speak RPC to production services. Canonical wiki instance of concepts/object-capability-rpc + [[concepts/ promise-pipelining]] + concepts/bidirectional-rpc + patterns/capability-returning-authenticate + patterns/record-replay-dsl; positioned as a GraphQL alternative for the waterfall problem at the RPC layer. 643 HN points.
Feature-flag infrastructure (2026-04-17 Agents Week)¶
- systems/cloudflare-flagship — native feature-flag
service, private beta launched 2026-04-17, built on
OpenFeature and on Cloudflare's
own developer-platform primitives (
Workers, Durable
Objects, Workers KV).
Architecture: per-app DO = source of truth (SQLite-
backed, single-writer, holds flag config + changelog) →
Workers KV = edge-replicated read tier (sync "within
seconds") → evaluation engine runs in-isolate in the
same V8 isolate already serving the request, reading
flag config from edge-local KV — "both the data and the
logic live at the edge — nothing is sent elsewhere to be
evaluated." Worker binding on
env.FLAGSviawrangler.jsonc: { flagship: [{ binding, app_id }] }; typed accessors for boolean / string / number / full JSON object variations +*Details()returning value + variant + reason. OpenFeature SDK works across Workers, Node.js, Bun, Deno, browser. Rules compose via AND/OR nested up to 5 levels deep + priority (first-match wins). Percentage rollouts use consistent hashing on a chosen context attribute (userIdcanonical) so ramps are monotonic — stable bucketing across the 5% → 10% → 50% → 100% sequence (concepts/percentage-rollout). Full audit trail with field-level diffs. Launch thesis (the post's lede, not the architecture): flags are the bounded-blast-radius primitive for agent-controlled deployment — "the human doesn't need to be in the loop for every step — they set the boundaries, and the flag controls the blast radius." Canonical wiki instance of patterns/do-plus-kv-edge-config-distribution and patterns/in-isolate-rule-evaluation. Source code public atgithub.com/cloudflare/flagship; SDKnpm i @cloudflare/flagship. - systems/openfeature — the CNCF open standard for feature-flag evaluation that Flagship implements as a Provider. Positioned as "the same relationship OpenTelemetry has to observability" — vendor-neutral API so "you write your evaluation code once against the standard, and swap providers by changing a single line of configuration." Canonical wiki instance of concepts/vendor-neutral-evaluation-api.
As third-party cosigner + audit-log operator for other companies (2026-05-01 Meta E2EE-backup post)¶
- Role not a Cloudflare system but a Cloudflare role — Cloudflare serves as the third-party signer + public audit-log operator for Meta's HSM-based Backup Key Vault fleet-key distribution to Messenger clients. Fleet public keys for new HSM deployments arrive over the air inside a validation bundle signed by Cloudflare and counter-signed by Meta; Cloudflare also maintains an audit log of every validation bundle it has signed, making any undisclosed signing event externally observable. Canonical wiki instance of the new patterns/third-party-countersignature-for-trust-anchor pattern and the new concepts/audit-log-as-transparency-artifact concept. Canonical wiki instance of Cloudflare acting as the second trust-root in another company's trust chain rather than as a primary service operator — structurally different from the Cloudflare-as-first-party-substrate posture visible elsewhere on this page (Workers, AI Gateway, etc.). See sources/2026-05-01-meta-strengthening-end-to-end-encrypted-backups and systems/messenger for the full deployment shape. Meta's framing: "signed by Cloudflare and counter-signed by Meta, providing independent cryptographic proof of their authenticity."
Key patterns / concepts¶
- patterns/flip-lookup-direction — when two sets meet at a membership check, iterate the smaller set and probe the other; pingora-origin header-stripping canonical instance (100 internal vs 10-30 request headers, 2.39× speedup from the direction flip alone).
- patterns/custom-data-structure-for-hot-path — when crates.io / stdlib containers are tuned for a different workload, writing a purpose-built structure is legitimate engineering at CDN-scale CPU budgets; trie-hard canonical instance (radix_trie a full µs slower than HashMap before the custom write).
- patterns/measurement-driven-micro-optimization — start with a flame graph / stack-trace sampling, pick the function by CPU share, benchmark against the real workload; trie-hard canonical instance (1.71 %-CPU target → 0.43 % via criterion + production sampling closing the loop).
- concepts/trie-data-structure — tree-prefix structure with
O(log L)miss behaviour,O(L)hit; the right shape for mostly-miss membership checks on short keys. - concepts/big-o-over-key-length — the hidden cost that
HashMap<String, V>reads areO(L)because every byte of the key goes into the hash; asymptotic-over-table-size intuition misleads at the µs scale. - concepts/stack-trace-sampling-profiling — production profiling by sampling the stack periodically and estimating per-function CPU share as sample-containment percentage; turned a one-line helper into a 680-core optimization target.
- concepts/hot-path — any code that runs per-request at tens of millions of requests per second — the unit of optimization attention at CDN scale.
- patterns/central-proxy-choke-point — AI Gateway proxy Worker pattern: all LLM requests flow through a single choke point that validates auth, strips client keys, tags metadata; mirrors pingora-origin's role as the egress choke point for upstream requests.
Workers / V8 / Node.js performance patterns and concepts¶
- concepts/v8-young-generation — V8 GC young-space size as an embedder-tunable knob; Cloudflare's 2017 manual cap (V8-guidance for 512 MB envs, Workers defaults to 128 MB) didn't age well as V8's GC evolved. Removing the manual tune → ~25 % benchmark improvement globally.
- concepts/warm-isolate-routing — routing-heuristic primitive for V8-isolate serverless runtimes. I/O-bound vs CPU-bound workloads want opposite coalescing behaviours; the 2025-10 Cloudflare fix detects CPU sustain and biases fresh- isolate spin-up while preserving I/O-bound coalescing.
- concepts/benchmark-methodology-bias — correlated noise +
client-side latency + hardware-gen lottery + multitenancy +
TTFB/TTLB skew + unset
NODE_ENV: six distinct bias classes catalogued in one of the clearest published enumerations. - concepts/stream-adapter-overhead — Node ⇆ Web
ReadableStreamadapter double-buffering cost; value-oriented vs byte-orientedReadableStreamdefault-highWaterMarkpitfall; canonical skip-the-adapter fix (ReadableStream.from(chunks)). - concepts/backpressure — the slow-consumer-signals-fast-
producer control primitive. Canonical wiki instance of the
2026-02-27 Cloudflare critique that Web streams' backpressure
is advisory-only (
controller.enqueue()always succeeds regardless ofdesiredSize; producers routinely ignorewriter.ready); enforced four-policy alternative in new-streams. - concepts/async-iteration — the ES2018
for await…ofprotocol the WHATWG Streams Standard predates (2014-2016) and was retrofitted with post-hoc. Canonical wiki argument that this 2-year timing mismatch is the root design flaw behind Web streams' readers+locks+controllers machinery. - concepts/pull-vs-push-streams — evaluation-model axis.
Web streams is push-oriented;
TransformStream.transform()runs eagerly on write, cascading data through intermediate buffers before the consumer pulls. Pull-based designs (Unix pipes, async generators, systems/new-streams) have implicit backpressure and cancellation as consequences of the model. - concepts/promise-allocation-overhead — the dominant
per-request cost in Web streams hot paths. Vercel measurement:
12× gap between
ReadableStream.pipeThrough()and Nodestream.pipeline()attributed almost entirely to promise + object allocation overhead. Cloudflare internal fix: up to 200× fewer promises in one Workers data pipeline. Per- request GC reaching 50 %+ of CPU in streaming SSR. - concepts/byob-reads — bring-your-own-buffer reads in Web streams. Canonical wiki instance of the 2026-02 critique that BYOB ships dual-path implementation complexity for both consumers and producers despite seeing little real-world adoption.
- patterns/upstream-the-fix — fix in the ecosystem
primitive (V8
JSON.parse(reviver), Node.js trig compile flag, OpenNext adapter allocations, benchmark-repo bugs, Node.js Web-streamsfast-webstreams) not just in one's own platform. Canonical 2025-10-14 four-PR instance + 2026-02-27 Vercel-collaboration second instance (Snell-as-Node-TSC accelerating competitor-originated work). - patterns/explicit-backpressure-policy — design pattern of
requiring the producer to pick one of four bounded-buffer
responses (strict / block / drop-oldest / drop-newest) at
stream-creation time, with
strictas the default. The structural response to advisory-backpressure failure modes. - patterns/lazy-pull-pipeline — pipeline stages execute
only when the consumer iterates; no eager background
pumping; no intermediate-buffer cascade; stopping iteration
stops the pipeline. The engineering pattern for composing
pull-based transforms (async generators,
Stream.pull()in systems/new-streams).
Security / identity patterns and concepts¶
- patterns/sso-with-ephemeral-ssh-keys — OPKSSH's replacement of long-lived SSH keys with 24h keys minted from an OIDC login; server authorization flips from key-fingerprint ACL to email ACL.
- patterns/ssh-certificate-extension-smuggling — how OPKSSH transports the PK Token through a standard SSH handshake: pack it into the SSH certificate extension field, leaving the protocol and binaries untouched.
- patterns/identity-to-key-binding — the load-bearing verifier invariant that prevents PK Token replay against an attacker's keypair.
- concepts/pk-token — OIDC ID Token committed to a user's public key; the primitive enabling OIDC SSO over SSH / E2E messaging / signed commits.
- concepts/sso-authentication — the SSO / OIDC trust anchor.
- concepts/ephemeral-credentials — the credential-lifetime discipline OPKSSH applies to SSH keys.
Network defence patterns and concepts¶
- patterns/autonomous-distributed-mitigation — detection + mitigation on every server of the fleet with no central scrubbing tier and no human in the loop; 7.3 Tbps attack across 477 data centres canonical instance; pairs with concepts/anycast for delivery and patterns/gossip-fingerprint-propagation for cross-POP intelligence sharing.
- patterns/gossip-fingerprint-propagation — each server's top attack fingerprints multicast/gossiped within-POP and globally; peer servers get threat-intel without a central control-plane service; avoids creating a DDoS target out of the threat-sharing mechanism.
- concepts/anycast — the same IP advertised from every POP; BGP best-path selection routes each source's packets to the nearest POP. Canonical DDoS-defence primitive: attacker geographic distribution becomes per-POP-capacity advantage.
- concepts/udp-reflection-amplification — three-property exploit (UDP spoofable + responsive reflector + query<reply amplification) used for QOTD / Echo / NTP / Portmap / RIPv1 / Memcached / DNS / SSDP / CLDAP vectors. 2025-06-20 attack used 6 of these alongside direct UDP flood + Mirai.
Content monetization / verified-bots patterns and concepts¶
- concepts/http-402-payment-required — the long-dormant HTTP status code Cloudflare resurrected as the per-request paid-content negotiation primitive. Defined in HTTP/1.1, effectively unused until pay-per-crawl.
- concepts/http-message-signatures — RFC 9421; per-request cryptographic signatures over canonical HTTP covered-fields. Carries Web Bot Auth's bot identity.
- concepts/verified-bots — general problem domain (distinguish legitimate automated clients from imposters); Cloudflare's cryptographic answer is signed-request-per-request rather than IP / reverse-DNS / API key.
- concepts/agentic-paywall — the stated end-state: AI agents programmatically hitting 402s with a spending budget, paying per-resource without human-in-the-loop. Pay-per-crawl is the substrate.
- patterns/price-header-negotiation — reactive (discover
price via 402, retry with
crawler-exact-price) and preemptive (crawler-max-priceup front, single round trip) HTTP-header flows. - patterns/signed-bot-request — Ed25519 keypair + JWK directory + per-request RFC 9421 signature; the canonical recipe for cryptographic bot identity.
- patterns/merchant-of-record-aggregation — collapse N×M bilateral billing contracts into N+M enrollments via a trusted intermediary that's the legal counter-party for all transactions.
- concepts/stealth-crawler / concepts/declared-crawler /
concepts/undeclared-crawler — the three-way crawler-
identity taxonomy the 2025-08-04 Perplexity post
establishes. Declared + compliant is the cooperative
baseline (ChatGPT); undeclared + ignoring directives is the
failure mode (Perplexity);
undeclared ⊂ stealthwhen the evasion is deliberate. - concepts/user-agent-rotation / concepts/asn-rotation / concepts/ip-rotation-as-evasion — stealth-crawler evasion tactics and why none of them survive ML bot fingerprinting over content-independent request features.
- concepts/robots-txt-compliance — operator-side discipline
of fetching and honoring
robots.txt; the 2025-08-04 post binary-splits ChatGPT (full compliance) vs Perplexity stealth crawler (non-compliance at the operator level, even if individual declared crawlers comply). - concepts/ml-bot-fingerprinting — ML classifier over TLS + HTTP/2 + timing features producing a bot score that survives IP + ASN + UA rotation; the structural answer to stealth-crawler evasion.
- patterns/brand-new-domain-experiment — methodology
pattern for zero-confounder crawler-attribution claims.
Cloudflare purchases never-registered domains, publishes a
blanket
robots.txtDisallow, and queries the suspected crawler's product about the content — any retrieval is proof of direct fetch in violation. Canonical 2025-08-04 Perplexity instance. - patterns/stealth-crawler-detection-fingerprint — ML + network-signal fingerprinting pattern that produces a stealth signature durable against identity rotation; ships as customer-deployable managed-ruleset block signatures.
- patterns/stealth-on-block-fallback — the documented anti-pattern of running a stealth crawler as a fallback when the declared one is blocked. Escalation signature (reactive rotation; answer-specificity degradation under successful stealth block) and why it fails against ML fingerprinting + verified-bot delisting.
- patterns/verified-bot-delisting — enforcement lever that moves a violating operator out of the known-good allowlist; activates bot-management scoring + withdraws downstream- feature privileges (pay-per-crawl, default-allow); paired with public attribution as the reputational side of enforcement. Canonical 2025-08-04 Perplexity instance.
Reliability / incident post-mortems¶
- concepts/feature-file-size-limit — preallocated fixed-size
cap on internally-generated config files; load-bearing
invariant that fails-closed on violation without a fail-open
path. Canonical wiki instance: the FL2 bots module's
200-feature cap breached on 2025-11-18 by a doubled
ClickHouse metadata query result →
.unwrap()panic → ~3 hours core-traffic outage. - concepts/preallocated-memory-budget — the hot-path optimization (no runtime alloc, no GC, cache locality) that creates the size cap. Legitimate performance choice; becomes a hazard when the inbound payload can grow without matching ingest validation.
- concepts/internally-generated-untrusted-input — the trust-boundary-confusion failure class. "We generate it, so it's trusted" is the wrong model when the generator's output depends transitively on upstream systems (database grants, schema migrations, library updates). Cloudflare's 2025-11-18 stated #1 remediation names this discipline explicitly.
- concepts/database-permission-migration-risk — the transitive-dependency failure class: a correct, defensive database permission change (fine-grained grants for per-user distributed-query accounting) silently changes the row count returned by a downstream consumer's metadata query. Neither team catches it at code-review time because the bug surfaces at the intersection.
- concepts/clickhouse-distributed-query — the upstream primitive whose migration was the 2025-11-18 trigger.
- concepts/unhandled-rust-panic — the Rust-side crash
shape. Rust's type system prevents nil-index-class bugs but
not
.unwrap()on an error path the programmer didn't handle. Canonical wiki instance on FL2. - concepts/nil-index-lua-bug — the Lua-side crash shape. Canonical wiki instance on FL1 (2025-12-05).
- concepts/intermittent-failure-signal-confusion — why 2025-11-18's failure shape (5-minute oscillation driven by gradual ClickHouse-grant rollout) mimicked an external DDoS attack. The status-page coincidence reinforced the misdirection. Canonical wiki instance.
- concepts/global-configuration-system — Cloudflare's name for the fleet-wide config-delivery channel that propagates every edit in seconds with no canary. Named explicitly on 2025-12-05 as "under review following the outage we experienced on November 18" — review incomplete when the same system delivered the 12-05 trigger.
- concepts/fail-open-vs-fail-closed — the design-choice discipline. Named as a stated remediation on both 2025-11-18 and 2025-12-05. Two outages, same underlying absence.
- concepts/cdn-as-critical-internet-infrastructure — framing for why Cloudflare's public post-mortems are longer, deeper, and published same-day — CDN outages are front-page news and the remediation backlog is public record.
- patterns/harden-ingestion-of-internal-config — stated #1 remediation on 2025-11-18. Treat internally-generated config as untrusted input; validate size / shape / invariants before loading into the hot-path buffer; fall back to known-good on violation.
- patterns/global-feature-killswitch — stated #2 remediation on 2025-11-18. An orthogonal fast-off lever that disables a feature consuming bad config in seconds, independent of the config-cleanup path. Compositional companion to progressive-rollout.
- patterns/global-configuration-push — the antipattern framing of the rapid-global-delivery channel itself. Not wrong to have; dangerous to have as the only option.
- patterns/rust-replacement-of-dynamic-language-hot-path — the Rust rewrite pattern. Prevents one bug class (nil-index / type-coercion) but not the fail-open / ingest-validation classes. 2025-12-05 is the canonical pattern-works instance (FL2 structurally immune to the Lua bug); 2025-11-18 is the canonical pattern-has-limits instance (FL2 still panicked).
- concepts/bgp-route-withdrawal — the single-action global- outage primitive for anycast services: because the same IP is advertised from every POP, unadvertising it anywhere-and- everywhere is an Internet-speed global event. Two distinct causes produce identical wire behaviour — external BGP hijack (2024-06-27 incident) and self-inflicted withdrawal (2025-07-14 incident) — and incident responders have to disambiguate them.
- concepts/service-topology — the configuration abstraction deciding where each service's IP is allowed to be advertised from; the edit surface where the 06-06 latent misconfig lived and the 07-14 trigger fired.
- concepts/latent-misconfiguration — dormant bug pattern: referentially-wrong config gated by a precondition that happens not to hold, activated by an unrelated change to the same surface. Canonical wiki instance is the 38-day dormant 1.1.1.1-prefix-linked-to-DLS-topology bug that detonated on 07-14.
- patterns/progressive-configuration-rollout — control-plane canary / staged deployment applied to config rather than code; the discipline the legacy Cloudflare addressing system lacks and the stated remediation is built around. Sibling of patterns/staged-rollout for the config plane.
- patterns/dual-system-sync-during-migration — legacy-and- strategic-in-parallel posture during long migrations; the structural context in which the 06-06 bug was introduced (on the legacy surface that lacks progressive deployment), and for which the stated remediation is to shorten the window by accelerating legacy deprecation.
Post-quantum patterns and concepts¶
- concepts/q-day — the operational-threshold concept for a CRQC breaking deployed asymmetric cryptography. Canonical wiki instance is Cloudflare's 2026-04 reassessment: Q-Day pulled forward from "2035+" to "as early as 2029" by the three-front compounding (Google's undisclosed P-256 speed-up
- Oratomic's 10,000-qubit neutral-atom estimate + 3-4 physical- to-logical-qubit ratio). Scott Aaronson's "public estimates are now going dark" framing explicitly adopted.
- concepts/cryptographically-relevant-quantum-computer — the capability Q-Day names the threshold of; three independent engineering fronts (hardware + error correction + software) whose progress compounds.
- concepts/harvest-now-decrypt-later — the historical primary PQ threat; addressed by Cloudflare's 2022 PQ-KEM-by- default rollout; >65 % of human traffic now PQ-encrypted. Extended into site-to-site / IPsec layer via the 2026-04-30 Cloudflare IPsec PQ GA.
- concepts/post-quantum-authentication — the newly-urgent 2026 threat axis. Impersonation / signature forgery rather than retroactive decryption; Q-Day-near-term implies this is where migration urgency concentrates; Cloudflare canonical articulation of the priority flip. Also the flagged remaining gap for IPsec standards post 2026-04-30 IPsec PQ-KEM GA.
- concepts/downgrade-attack — why enabling PQ is not sufficient. Active CRQC attacker forges transcript signatures, forcing selection of classical primitives; needs disable-legacy-before- rotate + PQ HSTS / CT downgrade protection.
- concepts/long-lived-key-risk — root CAs + code-signing certs + federation trust anchors + API auth keys first in the migration priority order. Persistent-access keys are the natural first target under scarce/expensive early-CRQC conditions and remain the largest exposure surface under scalable-CRQC covert-attack scenarios (Sophie Schmieg's Enigma analogy).
- concepts/hybrid-key-encapsulation — the transition-era
deployment shape combining PQ KEM with a classical KEM.
Canonical wiki instances: TLS 1.3
X25519MLKEM768(2022), SSHsntrup761x25519-sha512(2025), IPsec hybrid ML-KEM per draft-ietf-ipsecme-ikev2-mlkem (2026-04-30 Cloudflare IPsec GA). - concepts/ciphersuite-bloat — NIST SP 800-52r2 warning canonicalised via the RFC 9370 experience. Generic extension mechanism without named canonical ciphersuite → vendor- specific implementations → cross-vendor non-interop. Palo Alto Networks' RFC 9370-based early PQ IPsec implementation is the canonical case study.
- concepts/internet-scale-interoperability — the design bar PQC clears and QKD does not; the property ciphersuite bloat destroys even after a standard exists.
- concepts/quantum-key-distribution-qkd — the hardware-based alternative Cloudflare explicitly rejects as "will not operate at Internet scale". Cited as a distraction that delayed IPsec PQ rollout by ~four years vs TLS.
- patterns/standards-convergence-over-vendor-extension — the discipline RFC 9370 violated and draft-ietf-ipsecme- ikev2-mlkem fixes. Name the canonical ciphersuite at standards-publication time; don't let vendors fill the gap with incompatible choices.
- patterns/tls-first-pqc-rollout-as-blueprint — the meta- pattern of rolling out PQ to sibling protocols by reusing TLS's construction, negotiation mechanism, and ciphersuite convergence. IPsec's 2026-04-30 GA is the canonical sibling instance; SSH (2025 GitHub deployment) is the other.
- concepts/zero-knowledge-proof — the novel 2026 disclosure shape: Google proved possession of its P-256 quantum-attack speed-up without publishing the algorithm. New epistemic position for the industry — trust-in-a-proof-of-a-secret drives timeline compression.
- patterns/disable-legacy-before-rotate — the three-step PQ migration discipline: enable PQ → disable classical → rotate every secret that ever crossed a classical session. Cloudflare canonical articulation that rotation takes years, not months.
- patterns/third-party-dependency-quantum-assessment — scope expansion: direct cryptographic counterparties and indirect critical-business dependencies (financial services, utilities) must be included in procurement + migration planning.
- patterns/default-on-security-upgrade — the three-instance recurring Cloudflare posture (2014 Universal SSL → 2022 PQ- for-all → 2029 full PQ-security) where security is priced into the platform across all customer tiers.
Posture¶
- Rust-first on performance-critical services — Pingora, trie-hard, proxy stack. Cloudflare was an early production voice on "Rust plus microsecond-level profiling works."
- Cares about single-digit-percent CPU. Most orgs don't profile 1.71 % of a single helper. At 40,000 cores it's 680 cores, which justifies writing a new crate and publishing the result.
- Measurement → design → measurement. Criterion microbench numbers are cross-checked against production stack-trace sampling in both published posts; they don't land either alone.
- Open-source reflex. Trie-hard, Pingora, AI Gateway bits, workers-oauth-provider, Agents SDK — the blog is also an advertising channel for the OSS ecosystem around the products.
- Cross-platform agent-autonomy protocol design. Rather than build a one-off integration with a single partner, Cloudflare co-designs a generic protocol with a peer platform (Stripe) for agent-driven provisioning + payment
- identity attestation across providers — then explicitly frames any signed-in-user platform as a drop-in orchestrator candidate, and plans a formal specification to follow. 2026- 04-30 agents can now create Cloudflare accounts, buy domains, and deploy is the canonical wiki instance — the three-component discovery + authorization + payment pattern (patterns/agent-provisioning-protocol) and the orchestrator-provider-agent trust triangle it assumes; composes with Code Mode on the API-call altitude and Agent Skills on the task altitude.
- Coordinated CVE response with runtime bpf-lsm
mitigation + scheduled LTS patching in parallel —
when an unprivileged Linux-kernel vulnerability drops,
Cloudflare's instinct is not a single workstream; it's
five in parallel: blast-radius mapping, detection-
coverage validation, retroactive threat-hunting of
48h of fleet-wide logs (patterns/fleet-wide-retroactive-threat-hunt),
runtime-mitigation engineering via bpf-lsm while
the patched LTS kernel is still in backport, and
scheduled patched-kernel rollout via the
ERR
pipeline once backport lands. The posture is:
behavioural detection provides
vulnerability-agnostic coverage before any CVE-
specific rule is written; bpf-lsm provides surgical
runtime mitigation that covers the
LTS-
backport-latency gap; ERR closes the window
permanently. Two-phase bpf-lsm rollout —
visibility via
ebpf_exporterfirst to validate the allow-list, then enforcement via bpf-lsm behind a separate gate. First-attempt mitigation failures are expected and designed-for: Cloudflare's staging layer is explicitly the fault-domain boundary for "attempt a mitigation you haven't fully characterised yet." Entire posture runs under assume-compromise until proven otherwise. 2026-05-07 Copy Fail response is the canonical wiki instance — CVE-2026-31431, 5-day disclosure-to-fully-mitigated window, 48-hour retroactive threat hunt (clean), first-attemptmodprobe blacklist algif_aeadbroke in staging, surgicalsocket_bindbpf-lsm allow-list denial shipped fleet-wide within 30 hours. Named follow-ups: "better visibility into kernel-API dependencies", "better runtime mitigation", "reduce attack surface of Linux Kernel" (at build time, canonical concepts/kernel-attack-surface lever).
Recent articles¶
- sources/2026-06-18-cloudflare-build-your-own-vulnerability-harness — 2026-06-18 Build your own vulnerability harness. Practical operational guide to Cloudflare's fleet-wide model-agnostic vulnerability scanning architecture. Discloses the Vulnerability Validation System (VVS) as the second stage after VDH, with detailed numbers: 128 repos, 20,799 raw candidates → 7,245 actionable findings, 50-200 concurrent workers, per-repo budget caps. Introduces systems/cloudflare-vulnerability-validation-system, concepts/inverted-index-deduplication, concepts/model-agnostic-orchestration, concepts/context-exhaustion, concepts/stateless-agent-compute, concepts/coverage-cell, concepts/shallow-run-detection, patterns/sqlite-keyed-stage-persistence, patterns/model-as-interchangeable-component, patterns/per-repo-budget-cap, patterns/sibling-fork-for-scope-deviation, patterns/fail-pass-flip-gate. Extends systems/cloudflare-vulnerability-discovery-harness with full operational data. Companion to the 2026-05-18 Project Glasswing post.
- sources/2026-06-17-cloudflare-bringing-more-agent-harnesses-and-frameworks-to-cloudflare
— 2026-06-17 Bringing more agent harnesses and frameworks to Cloudflare.
Articulates the three-layer agent
platform stack (framework → harness → runtime). Introduces Flue
as first third-party framework on Agents SDK; discloses
@cloudflare/shell(virtual filesystem),@cloudflare/workspace(DO-to-container sync). Code Mode quantified at <10 ms cold start, $0.002/load. - sources/2026-06-12-cloudflare-scaling-security-insights
— 2026-06-12 Scaling Security Insights: 10x throughput.
How Cloudflare scaled their security scanning system from 10 to
120 scans/sec via batch-parallel Kafka consumption, fast/slow lane consumer splits, active-passive API failover, hybrid bulk inserts, and adaptive rate-limited scheduling — all without adding Kafka partitions.
- sources/2026-06-03-cloudflare-enforcing-the-first-as-in-bgp-as-paths — 2026-06-03 Enforcing the First AS in BGP AS_PATHs. Cloudflare investigates Spamhaus-reported route hijacks using forged AS_PATHs, demonstrates that RPKI+ASPA alone cannot stop forged-origin attacks, and stress-tests all Tier 1 networks on concepts/first-as-enforcement. Half of Tier 1 networks fail, primarily due to Juniper's non-enforcing default. Introduces patterns/safe-by-default as a vendor security posture principle.
-
sources/2026-06-01-cloudflare-how-we-reduced-core-unit-boot-time-from-hours-to-minutes — 2026-06-01 How we reduced core unit boot time from hours to minutes. Cloudflare OpenBMC team retrospective on a fleet-wide regression in core-server boot time. The Gen12 core fleet (~2,000 bare-metal units running centralised control plane / billing / analytics) began taking 4 hours to boot after a routine firmware update, vs minutes before — "new nodes faced the full timeout gauntlet on their very first boot. Maintenance windows ballooned. Engineering teams had to babysit upgrades that should have run unattended." Root cause was not a firmware regression — POST and hardware initialisation completed normally — but linear-search timeout amplification across an undeclared network-boot-interface list: the firmware probed IPv4 HTTPS → IPv4 iPXE → IPv6 HTTPS → IPv6 iPXE in order, ~5-min timeout each, with the actually-working IPv6 HTTPS interface arriving at position 3+. Per-boot ~20 min × multi-reboot firmware upgrades ≈ 4 hours. The bug was structurally present before the firmware update — the update merely shifted which interface won the linear-search race. Fix: the declare-boot-interface-order-upfront pattern — declare the correct interface in the pre-boot PXE stage per hardware/use-case, eliminating the guesswork. Three ancillary obstacles required real engineering: (1) older UEFI versions don't support boot ordering + UEFI configuration is often reset on firmware upgrade → resolved with the state-validation auto-reapply pattern; (2) vendor obstacles — an OEM-locked
Force Priority Httpv4 Httpv6 Pxev4 Pxev6immutable token plus a lazy-loaded EFI_IFR_REF3 "Network Boot Interface" HII data structure ("not instantiated until it is explicitly accessed via a GUI callback") → resolved with new BIOS firmware from the vendor + tokens in a "Boot Order Module" that force discovery during the boot sequence; (3) NIC vendor string drift — same logical interface emits different strings (UEFI: HTTPS IPv4 Ethernet Network Adapter XXX-XXX-Y for OCP 3.0 P1vsUEFI: HTTPS IPv4 Network Adapter - 50:00:E6:8F:4F:32 P1) → resolved with the wildcard match.*HTTP.*IPv4.*P1inCfHIIConfig_App; durable fix is in-flight vendor coordination to standardise strings (drop MAC + product details, keep protocol + transfer-type + port + slot). (4) iPXE reads UEFI variables as hex, so auefi-same-hexboolean flag short-circuits ashow+ parse + conditional-setinto a singleset( hex-comparison flag pattern). Headline operational outcomes: firmware-upgrade automation 4 hours → 3 minutes; subsequent single-boot ~20 minutes → < 1 minute. Endpoint state: "a single BIOS firmware image serves all SKUs, configuration updates deploy at scale through our existing release pipeline, and the entire workflow operates from iPXE" — configuration-as- code generalised to firmware altitude. Created (12 new pages): 1 source + 3 systems (systems/uefi, systems/ipxe, systems/cloudflare-gen12-server) + 5 concepts (concepts/network-boot-interface, concepts/linear-search-timeout-amplification, concepts/firmware-config-persistence-loss, concepts/vendor-string-mismatch-on-fleet-config, concepts/lazy-loaded-bios-data-structure) + 4 patterns (patterns/declare-boot-interface-order-upfront, patterns/state-validation-with-auto-reapply-and-reboot, patterns/wildcard-config-match-for-vendor-string-drift, patterns/hex-comparison-flag-for-ipxe-config-check). First major hardware/firmware-altitude wiki disclosure for Cloudflare — distinct from edge-network / Workers / data- platform altitudes; sibling to AWS-EBS-hot-swap-retrofit and Fly.io-fleet-drain instances of fleet-upgrade discipline. -
sources/2026-05-28-cloudflare-how-we-built-cloudflares-data-platform-and-an-ai-agent-on-top-of-it — 2026-05-28 How we built Cloudflare's data platform and an AI agent on top of it. Cloudflare's most comprehensive dogfood-as-reference-architecture post to date — every customer Developer Platform primitive (R2, Workers, Cloudflare Access, Workflows, Durable Objects, D1, KV, Workers AI) threading through one workload at billion-events/second scale, presented as the "if we were going to make a major investment in our data infrastructure, it was going to be built on the same products we sell to customers" recipe. Two systems debut: Town Lake (the unified data platform — "a single SQL interface to everything Cloudflare knows") and Skipper (the AI data agent — natural-language → validated SQL via Code Mode over MCP). Five named subsystems sit between engine and storage: R2 Data Catalog (managed Iceberg on R2 with recency-tiered recompaction — per-minute → hourly → daily as data ages), DataHub (metadata catalog), Lifeguard (access control: rules in D1 + dynamic IAM group-membership pull → JSON policy served to Trino over HTTP), Skimmer (PII detection — fast per-column classifier + agentic full-table-context second pass), Transformer (YAML-frontmatter SQL DAG ELT engine on Workflows; DO state, R2 definitions, D1 history; per-node
.meta.jsonemitted to DataHub on every successful run — the substrate of code as context for data agents). Query engine is Apache Trino: a single SQL plan can join a Postgres table, a ClickHouse table, and an Iceberg table on R2 — predicate pushdown into source, joins in memory, no intermediate materialisation; canonicalised at patterns/single-sql-interface-over-heterogeneous-sources. Default-closed governance: tables are inaccessible until reviewed; schema discovery is decoupled from data access so new columns don't break existing dashboards; PII is opt-in per session with audit logging on every flip; the security model is the data model (Skipper runs as the calling user, no privilege escalation; saved query → shared dashboard permissions checked at view time, not save time because group membership changes). Skipper layers five tiers of grounded context (concepts/layered-grounded-context-for-data-agent): schema metadata → human annotations → code-derived knowledge (Transformer.meta.json) → curated data-model pages (MCP resources) → runtime introspection (DESCRIBE / DISTINCT / COUNT to Trino as the safety net). Code Mode as the MCP surface (patterns/code-mode-mcp-for-data-agent): two tools (search,execute) instead of 30 individual ones; the model writes JavaScript that calls Skipper's full toolset programmatically, executed in a sandboxed Dynamic Worker via WorkerLoader — "faster, cheaper, and the workflows it produces are auditable as code." Embedded dashboards: single-tag iframe contract -
CSP
frame-ancestorscorporate-domain restriction + Cloudflare Access gate on iframe contents + view-time permission re-check against underlying tables — canonicalised at patterns/embedded-dashboard-with-zero-trust-iframe. Self-serve permission requests replacepermission denied: query an unallowed table → "this table needs review, click here to request one" with Skipper suggesting the right RBAC group (patterns/error-message-as-self-serve-permission-request). Operational disclosures: >1B events/sec on Cloudflare's network; 53% of Town Lake queries are billing-related; 91,760 queries from 324 distinct employees in a recent measurement period; legacy 200–300-line revenue rollup SQL → 5 lines in Skipper; "top 100 customers by revenue" / "top 100 domains signed up today" — ~3 seconds in Skipper. Four named design lessons (canonicalised as concepts): less prompting is more (concepts/less-prompting-is-more-for-analytical-agents — "the model is good at reasoning about analytical workflows; it doesn't need to be micromanaged"); tool overlap is poison (concepts/tool-overlap-poisons-agent-accuracy — threefetch_results_*tools collapsed to one with amodeparameter); code, not metadata, captures meaning (concepts/code-as-context-for-data-agents —customer_type = paygodefaults when Salesforce data is missing — visible in the SQL, never in the column description); memory matters more than expected (sibling at Grafana Assistant). Closing thesis: "the boring infrastructure is the hard part — Trino + Iceberg is not new technology. The hard work is in the boring stuff: per-row access control, default-closed table allowlisting, query auditing, time-bound credentials, PII detection, idempotent ingestion, schema evolution" — "the things that make a data platform safe to actually use." Future direction: deeper integration with internal chat + ticketing ("ask the data" as the debugging-first-move); R2 SQL named as the planned successor query engine for parts of Town Lake's workflow as it matures. Six new wiki systems created: cloudflare-town-lake, cloudflare-skipper, cloudflare-r2-data-catalog, cloudflare-lifeguard, cloudflare-skimmer, cloudflare-transformer-elt. Eight new concepts: default-closed-table-allowlist, schema-discovery-vs-data-access-separation, opt-in-pii-redaction-per-session, layered-grounded-context-for-data-agent, code-as-context-for-data-agents, tool-overlap-poisons-agent-accuracy, less-prompting-is-more-for-analytical-agents, security-model-as-data-model. Seven new patterns: single-sql-interface-over-heterogeneous-sources, default-closed-allowlist-with-automated-pii-scan, two-pass-pii-classifier-with-agentic-second-pass, code-mode-mcp-for-data-agent, error-message-as-self-serve-permission-request, embedded-dashboard-with-zero-trust-iframe, elt-on-workflows-with-do-state. Sibling to the existing internal AI engineering stack post: same dogfood-as- reference-architecture shape, applied to a different workload (data analytics, not internal AI tooling). Sibling to Ready-Analytics ClickHouse contention post: both are Cloudflare data-platform observability writeups, but Ready- Analytics is a billing-pipeline-substrate post about a ClickHouse-internals bottleneck, while Town Lake is the user-facing federated-SQL platform that sits at a higher tier. -
sources/2026-05-19-cloudflare-announcing-claude-managed-agents-on-cloudflare — 2026-05-19 Announcing Claude Managed Agents on Cloudflare. Joint launch with Anthropic for Claude Managed Agents — Anthropic's managed agent platform that runs the agent loop on Anthropic but lets the execution sandbox (file reads, command exec, code execution, web browsing) run on operator-chosen infrastructure. Anthropic frames this as "decoupling the brain from the hands." The wiki canonicalises the architectural framing at concepts/agent-brain-hands-decoupling. Cloudflare ships a forkable default deployment template that provides: (a) a Workers- based control plane that spins one sandbox per session and "automatically persists state across session sleeps"; (b) two interchangeable sandbox tiers — Containers (microVM) for "agents acting as a developer, building full applications and running Linux-based tools" vs Dynamic Workers + Codemode (V8 isolate) for "a faster, cheaper, and more scalable alternative" — picked at agent setup; the trade-off canonicalised at concepts/isolate-vs-microvm-for-agent-sandbox (boot: seconds vs milliseconds; memory: hundreds of MB vs a few MB; scale envelope: thousands vs "tens of thousands of concurrent agents"); (c) an outbound Worker proxy in the egress path for every byte the sandbox emits — operator-defined per-tenant / per-agent / per-metadata policy, server-side credential injection (the agent never sees secrets), allowlists, custom middleware (canonicalised at patterns/outbound-worker-proxy-for-agent-egress / concepts/outbound-proxy-credential-injection); (d) two new private-connectivity primitives — Cloudflare Mesh and Cloudflare Workers VPC — for connecting the egress proxy to private internal services "using post-quantum encrypted networking without a VPN or bastion host"; (e) a pre-bundled agent tool surface —
browser_search,browser_execute,screenshot,browse,fetch_to_markdown,web_fetch(via Browser Run);send_email,email_read,email_list(via Email Service — each agent gets its own address; inbound mail kicks off sessions);call_service(Mesh / Workers VPC tunnels);image_generate(via Workers AI) — "This pairs well with Claude providing text-based inference"; (f) custom tools added via Zod-typeddefineToolblocks incustom-tools.js, demonstrated with anr2_host_fileexample uploading to R2 and returning a public URL (canonicalised at patterns/pluggable-zod-tool-definition). Browser sessions ship with full audit trails and session-recording — "makes it easy to audit every action your agent's browser is taking on the web, apply allowlists and denylist to browser sessions, and save recordings of browser sessions for future debugging" — generalising the BISO human-user audit substrate to agent-driven browsing. Composes with the brain–hands decoupling thesis: Anthropic stays the model platform, Cloudflare becomes the first publicly-named external hands substrate for Claude Managed Agents; the brain↔hands contract is the wire format Claude phones home over to the operator's control plane on session start. Sibling to patterns/credentialed-proxy-sandbox (Agent Lee, 2026-04-15) and patterns/agent-sandbox-with-gateway-only-egress (Redpanda ADP, 2026-04-14) — these three independently- canonicalised instances now form the wiki's converged egress-as-only-boundary canon. Three new system pages (systems/claude-managed-agents, systems/cloudflare-mesh, systems/cloudflare-workers-vpc); three new concept pages (concepts/agent-brain-hands-decoupling, concepts/isolate-vs-microvm-for-agent-sandbox, concepts/outbound-proxy-credential-injection); two new pattern pages (patterns/outbound-worker-proxy-for-agent-egress, patterns/pluggable-zod-tool-definition); seven extended system pages (systems/cloudflare-containers / systems/dynamic-workers / systems/cloudflare-workers / systems/cloudflare-sandbox-sdk / systems/cloudflare-agents-sdk / systems/cloudflare-browser-rendering / systems/cloudflare-email-service / systems/workers-ai / systems/code-mode). -
sources/2026-05-18-cloudflare-project-glasswing-what-mythos-showed-us — 2026-05-18 Project Glasswing: what Mythos showed us. Cloudflare engineering retrospective on several months of running Mythos Preview — Anthropic's preview cyber frontier model delivered through Project Glasswing "without the additional safeguards present in generally available models like Opus 4.7 or GPT-5.5" — against "more than fifty" Cloudflare repos spanning runtime, edge data path, protocol stack, control plane, and OSS deps. Two capability deltas defined the model-class jump: exploit chain construction (turning UAF → arbitrary R/W → control-flow hijack → ROP into a working proof) and proof generation (write-compile-run-read-failure-adjust loop in a per-task scratch directory). Names a structural alignment caveat — organic refusal inconsistency: the model exhibits emergent pushback on legitimate research that is "not consistent enough to serve as a complete safety boundary"; "the same task, framed differently or presented in a different context, could produce completely different outcomes". The load-bearing architectural argument: pointing a generic coding agent at a 100k-LoC repo fails by context-window compaction ("maybe a tenth of a percent of the surface in a useful way before the model's context window fills up"); the answer is the vulnerability discovery harness — an 8-stage multi-agent pipeline (Recon → Hunt → Validate → Gapfill → Dedupe → Trace → Feedback → Report; patterns/multi-stage-vulnerability-discovery-harness) running ~50 hunters concurrently with per-task scratch dirs for compile-and-run PoCs, an adversarial validator with no ability to emit findings (vuln-research instance of patterns/adversarial-review-subagent), a cross-repo tracer fanning out one tracer per consumer repo for reachability, and a schema-self-validating report agent feeding a queryable ingest API. Four lessons crystallised: narrow scope, adversarial review, split bug and reachability, and parallel narrow over exhaustive. Closing argument inverts the default "two-hour SLA from CVE to patch" response — Cloudflare watched "a few" model-authored patches "fix the original bug while quietly breaking something else the code depended on" — toward architectural defense rather than faster patching (front- of-app defense + isolation + centralised fleet rollout). Memory-unsafe languages produce "consistently more false positives" — a new AI-vuln-triage tax axis on top of the well-known exploit- surface tax for C/C++ codebases. Cloudflare also disclosed that Mythos Preview built parts of the harness it then ran inside — "we used Mythos Preview to build on, tailor, and improve our original harnesses to suit its strengths" — dogfood-as-model-self-improvement. New canonical wiki pages: systems/mythos-preview, systems/anthropic-project-glasswing, systems/cloudflare-vulnerability-discovery-harness; concepts concepts/cyber-frontier-model, concepts/exploit-chain-construction, concepts/proof-of-exploitability, concepts/model-organic-refusal-inconsistency, concepts/signal-to-noise-in-ai-vulnerability-triage, concepts/model-bias-toward-finding-something, concepts/single-agent-coverage-failure-on-large-repos, concepts/architectural-defense-vs-faster-patching; patterns patterns/multi-stage-vulnerability-discovery-harness, patterns/narrow-scoped-agent-task, patterns/parallel-narrow-agents-over-exhaustive, patterns/split-bug-and-reachability-questions, patterns/proof-by-compile-and-run, patterns/cross-repo-tracer-fan-out, patterns/gapfill-requeue-for-coverage, patterns/report-agent-self-validates-schema. Sibling Cloudflare AI pipeline: AI Code Review — same coordinator/sub-reviewer shape applied to MR-time review rather than vulnerability discovery. Strengthens the memory-safety thread with a chain-aware-AI-triage data point and the adversarial review persona thread with a no-emit-validator sharpening of the constraint.
-
sources/2026-05-14-cloudflare-clickhouse-query-plan-contention — 2026-05-14 Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse. A year-long Cloudflare engineering retrospective on a query-planner- side bottleneck in ClickHouse that surfaced after Cloudflare extended the partitioning key on its 2 PiB Ready-Analytics table from
(day)to(namespace, day)to enable per-tenant retention. The migration's load-bearing assumption "per- query parts read shouldn't change → query duration shouldn't change" was correct about per-query data scanned but silently amplified per-cluster control-plane work scaling with total part count. Two months in, billing- pipeline jobs missed their hard daily deadline. Per-query metrics looked fine; CPU flame graphs pointed atfilterPartsByPartition(45 % CPU) and a heuristic-reorder patch delivered only 5 %. Switching to Real (wall-clock) flame graphs surfaced the actual bottleneck: ">50 % of leaf SELECT duration spent waiting on a single mutex" protectingMergeTreeData's parts list — every planner had been taking an exclusive lock, copying the whole parts list, releasing the lock, and only then filtering down. Three-patch fix landed: (1) shared lock for read-only planners — "It had no business using an exclusive lock" — "a massive, immediate drop in query duration"; (2) deferred-copy cached parts-list snapshot regenerated by modifying ops, eliminating the per-query vector copy (Opts 1 + 2 ship upstream as ClickHouse PR #85535 in ClickHouse 25.11); (3) binary search on the sortednamespaceprefix — exploiting the fact that the parts list is already sorted by the partitioning key — "breaks correlation of query durations with the number of parts". At 30k parts/replica the billing pipeline was failing; at 160k a year later durations are stable. Canonical wiki instance of per-tenant retention via partitioning key (the architectural primitive), lock contention in query planning (the failure class), CPU vs Real flame graphs (the diagnostic flip),system.trace_log(the investigation substrate), ClickHouse data parts (the unit whose count drives the cost), and max-min- fairness storage fair-share (the 90 % utilisation policy). New canonical Cloudflare-instance of patterns/upstream-the-fix in the OLAP-database substrate. Open second-order problem flagged: "100 GiB ZooKeeper cluster" tracking per-part metadata; future- post candidate. Closing question left explicitly open: "Was this partitioning scheme the right long-term choice?" -
sources/2026-05-13-cloudflare-browser-run-now-running-on-cloudflare-containers-its-faster — 2026-05-13 Browser Run: now running on Cloudflare Containers, it's faster and more scalable. Cloudflare engineering narrative of Browser Run's migration off shared Browser Isolation (BISO) infra onto its own Cloudflare Containers image. Three architectural layers disclosed: (1) Regional pre-warmed pools of DO+Container pairs — DO-enabled Containers create a Durable Object near the request, but the connected Container "may spin up on the other side of the world" — fine for one-shot RPCs, but the per-message round-trip on a screenshot's multi-message WebSocket exchange is dominated by the cross-region distance. Solution: the unit of selection is a colocated DO+Container pair within a region, drawn from a pre-warmed pool. Canonical wiki instance of concepts/do-to-container-cross-region-rtt + patterns/regional-pre-warmed-do-container-pair-pool. (2) Allocation state migrated from Workers KV to D1 + Cloudflare Queues. KV's eventual consistency (recently reduced 60s → 30s minimum cache TTL, "still too high") caused race-condition overallocation under demand spikes from AI agent builders: "You might check KV, see a container as 'available,' but by the time you route to it (30 seconds later), it's already claimed." Canonical wiki instance of concepts/eventual-consistency-too-slow-for-allocation. D1's transactional SQLite gives atomic check-mark-return claim via
UPDATE...WHERE...IN (SELECT...ORDER BY RANDOM() LIMIT ?5) RETURNING data(concepts/sqlite-transaction-for-atomic-resource-claim), with the migration shape canonicalised as patterns/transactional-db-over-eventually-consistent-kv-for-claim. Per-container 5-second state-update writes are batched through Cloudflare Queues (max_batch_size: 100, max_batch_timeout: 1) to amortise D1's per-row cost against batch-write P95 0.1 ms, lifting the per-location ceiling from 5,000 → 500,000 containers (100× headroom) at <2-second steady-state queue lag — canonical wiki instance of patterns/queue-batching-amortizes-db-write-throughput; paired with patterns/region-fallback-on-queue-backlog for staleness excursions. (3) Quick-action protocol coalesced from chatty WebSocket multi-message exchanges (open page → navigate → wait → screenshot, each requires round-trip) into a single HTTP request — "the entire flow executes internally without any back-and-forth between the worker and browser." Canonical wiki instance of patterns/single-http-request-over-chatty-websocket. Headline outcomes: 60 browsers/min via Workers binding, 120 concurrent (4× previous), >50% Quick Action latency reduction, WebGL + WebMCP unblocked because Browser Run now controls its own Chromium image cadence. Migration ran as a gradual ramp via in-path Worker (Quick Actions → free-tier binding → PAYG → contract customers) with no customer-facing change. Framed as Customer Zero for DO-enabled Containers — Browser Run's pain on the "novel, unstable early-stage Containers platform interface that was light on documentation, light on observability, and light on colleagues in an overlapping timezone" drove the platform upgrades that benefit external Containers customers next. Three workload-shape mismatches that motivated the off-BISO migration in the first place — image size (slowed startup + dev), POP distribution (BISO's footprint ≠ Browser Run's optimal), session shape (BISO long-steady ≠ Browser Run short-spiky) — canonicalised on systems/cloudflare-browser-isolation as an instance of the "shared infra is cheaper until your workload shape diverges from the shared shape" failure mode. 9 wiki pages new (1 source + 4 systems-extended/created including new systems/cloudflare-queues + systems/cloudflare-browser-isolation + 3 new concepts + 5 new patterns); 5 wiki pages extended (systems/cloudflare-browser-rendering + systems/cloudflare-containers + systems/cloudflare-d1 + systems/cloudflare-kv + systems/cloudflare-durable-objects + concepts/eventual-consistency). -
sources/2026-05-12-cloudflare-when-idle-isnt-idle-how-a-linux-kernel-optimization-became-a-quic-bug — 2026-05-12 When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug. Cloudflare engineering narrative of a subtle CUBIC bug in quiche — Cloudflare's open-source Rust QUIC / HTTP/3 library — where CUBIC's
cwndgot permanently pinned at the two-packet minimum and never recovered from congestion collapse. Test fixture (canonical wiki instance of patterns/adversarial-corner-case-test-for-recovery): 10 MB HTTP/3 download,RTT = 10 ms, 30% random packet loss for the first 2 seconds, 10 s timeout (vs expected 4–5 s completion). ~60% failure rate across 100-run batches with CUBIC; 100% pass rate with Reno as the control. Diagnosis via qlog (canonical instance):cwndlocked at 2,700 bytes (2 MSS), 999 state transitions betweencongestion_avoidanceandrecoveryin 6.7 s — one transition every ~14 ms tracking the 10 ms configured ACK clock. Root cause: the 2020 port of Linux TCP CUBIC's 2017 "after idle" optimisation (commit 30927520dbae) into quiche'son_packet_sent()inherited the primary fix but not the 1-week-later follow-up (commit c2e7204d180f) titled "tcp_cubic: do not setepoch_startin the future". Because QUIC runs in user space, there is noCA_EVENT_TX_STARTACK-processing callback; the port had to approximate it fromon_packet_sent()usingbytes_in_flight == 0as the idle predicate andnow - last_sent_timeas the delta. At minimumcwnd,bytes_in_flightdrops to zero on every ACK cycle andlast_sent_timeis the start of the previous RTT cycle, so the "idle" delta is ~RTT (14 ms) not ~0 ms — canonical false idle detection instance. The recovery boundary gets pushed into the future on every send,in_congestion_recovery()returnstrueon every ACK,cwndgrowth is skipped, the pipe drains, and the cycle repeats — the minimum-cwnddeath spiral. Fix: three lines of logic addinglast_ack_timeas a secondary anchor; usemax(last_ack_time, last_sent_time)as the idle- delta anchor — canonical patterns/measure-idle-from-last-ack-not-last-send instance. Restores 100% test pass rate. Contributed back to cloudflare/quiche. Lessons named: "'Idle' is harder to define than it sounds", "minimum-cwnd dynamics are a unique corner case", "the fix was surprisingly small compared to the complexity of the behavior". Closes with the forward-looking statement that Cloudflare also runs a model-based BBRv3 implementation via quiche's modular CCA design, "now enabled for a growing percentage of our QUIC deployments" — sibling shape to CUBIC in the quiche host, positioned as the structural alternative to loss-based-CCA corner-case dynamics like this one. Canonical wiki instance of patterns/userspace-port-of-kernel-primitive-risk: follow-up kernel fix invisible to the port-reviewer years later is the structural reason the bug lived six years before a CI integration test surfaced it. -
sources/2026-05-07-cloudflare-copy-fail-linux-vulnerability-response — 2026-05-07 How Cloudflare responded to the "Copy Fail" Linux vulnerability. CVE-2026-31431 "Copy Fail" was an unprivileged Linux-kernel local-privilege- escalation disclosed by Xint Code on 2026-04-29 16:00 UTC — an out-of-bounds 4-byte write in the kernel crypto API's
authencesnwrapper, reachable via the AF_ALG socket family +algif_aeadmodule, taints the page cache of a setuid-root binary (/usr/bin/sucanonical), and gives attacker- controlled 4-byte chunksroot. The upstream fix (commit a664bf3d603d) reverts a 2017 in-place-crypto optimisation; it hadn't been backported to Cloudflare's primary LTS line (6.12) at disclosure. No customer impact. 5 days to fully mitigated fleet (2026-04-29 → 2026-05-04). Five parallel workstreams: (1) blast-radius mapping (kernel versions vulnerable); (2) detection coverage validated within minutes — Cloudflare's pre-existing behavioural detection flagged the full exploit chain (script interpreter → kernel crypto → setuid binary) "without a signature update, without a rule change, and without human intervention" — canonical wiki first-class articulation of behavioural detection as coverage-that-exists-before-the-CVE-specific- rule; (3) 48-hour retroactive threat hunt across fleet-wide kernel logs + access logs + binary-integrity hashes + persistence-mechanism audit + network-connection audit, all five pillars clean (concepts/assume-compromise-posture + patterns/fleet-wide-retroactive-threat-hunt); (4) runtime-mitigation engineering — first attempt (the researchers' recommendedmodprobe blacklist algif_aead) pushed to staging 2026-04-29 evening, surfaced a dependency conflict, rolled back safely — canonical wiki patterns/staging-caught-mitigation-failure; overnight redrafted as a surgical bpf-lsm program on thesocket_bindLSM hook with an allow-list of legitimate AF_ALG callers (patterns/bpf-lsm-allowlist-hook-denial); (5) scheduled patched-kernel rollout via ERR once backport landed. Two-phase bpf-lsm rollout — Phase 1 2026-04-30 afternoon:ebpf_exporterconfig via salt, hooksocket()syscall, emit per-binary AF_ALG metric across hundreds of thousands of servers within hours, empirically validate the allow-list; Phase 2 same evening: bpf-lsm program pushed behind a separate enforcement gate (patterns/visibility-before-enforcement-rollout). End-to-end verification on a previously-vulnerable test node confirmed the exploit no longer works. Decision 2026-04-30 ~17:00 UTC: "ship a patched build of the previous LTS line through reboot automation; do not accelerate the new LTS; lean on bpf-lsm in the meantime" — canonical wiki articulation of the strategic-choice between runtime mitigation and LTS acceleration. 2026-05-04 morning: ERR resumed at normal pace with the patched kernel. Honest self-assessment on the structural gap: "despite our practice of deploying Linux patch updates every two weeks, we remained vulnerable because a month-old mainline fix had yet to be backported to our primary kernel line" — canonical wiki articulation of concepts/lts-kernel-backport-latency-gap. Named follow-ups: faster bpf-lsm deployments, better playbooks, better kernel-API-dependency visibility ("remove unused modules from the build entirely" = canonical attack-surface reduction at build time). Opens three new Key systems on the wiki: systems/cloudflare-bpf-lsm (first-class page for the runtime-kernel-mitigation framework), systems/cloudflare-edge-reboot-release (first public naming of the 4-week-cycle ERR pipeline + three-tempo structure), systems/prometheus-ebpf-exporter (first canonical page for Cloudflare's OSS Prometheus exporter used as measurement substrate for visibility-before-enforcement). Second canonical wiki instance of patterns/autonomous-distributed-mitigation at a different altitude from the 2025-06-20 7.3 Tbps DDoS post — same every-node-runs-the-loop shape applied to kernel-CVE mitigation instead of packet-drop. -
sources/2026-05-06-cloudflare-when-dnssec-goes-wrong-de-tld-outage — 2026-05-06 When DNSSEC goes wrong: how we responded to the
.deTLD outage. 2026-05-05 ~19:30 UTC DENIC (the.deTLD registry) began publishing non-validatable DNSSEC signatures during a routine scheduled key rollover; every validating resolver on the Internet — including systems/cloudflare-1-1-1-1-resolver|1.1.1.1 — was spec- required to return SERVFAIL for every.dequery that missed cache. User impact curve: SERVFAIL rate climbed steadily over 3 hours as cached records aged out, but NOERROR rate stayed stable — the canonical wiki production instance of serve-stale (RFC 8767) absorbing an upstream DNSSEC break at the recursive-resolver altitude. Cloudflare ended impact at 22:17 UTC (≈2h 47m) via a Negative Trust Anchor-equivalent override on.de+ applied the same mitigation to its internal origin-resolver service for CDN-customer.deorigin names. First canonical wiki naming of Big Pineapple as the resolver software powering 1.1.1.1. Two self-disclosed gaps with committed remediation: (1) no native RFC 7646 NTA mechanism in Big Pineapple (a generic insecure-zone override was used for.de— functionally equivalent but not RFC-7646-defined); (2) an Extended DNS Errors (RFC 8914) propagation bug where DNSSEC-Bogus errors surfaced as EDE 22 ("No Reachable Authority") instead of EDE 6 ("DNSSEC Bogus"), caused by the trust-chain verifier producing the right code but the outer response layer dropping it. Canonical wiki instance of cross-operator DNS-incident coordination via DNS-OARC Mattermost — "resolver operators across the Internet independently applied Negative Trust Anchors within an hour." Canonical articulation of TLD-level failure blast radius as a structural property of the DNS hierarchy (not a DNSSEC-specific vulnerability). Canonical DNS-altitude fail-stale (serve-stale) ↔ fail-open (NTA) ↔ fail-closed (SERVFAIL) three-way ladder — sibling to the config- deployment-altitude three-way ladder in the 2026-05-01 Code Orange post. Sibling-to the 2026-01-08 CNAME-ordering 1.1.1.1 incident: both implicate Big Pineapple, but the 2026-01-08 event was an internal code regression while 2026-05-05 was an external TLD-registry failure absorbed correctly. -
sources/2026-05-01-meta-strengthening-end-to-end-encrypted-backups — 2026-05-01 (Meta Engineering; canonical author Meta, Cloudflare plays a load-bearing role). Meta hardens the infrastructure around the HSM-based Backup Key Vault (the tamper-resistant HSM fleet backing WhatsApp
- Messenger E2EE backups) with two changes: (1) Over-the-air fleet-key distribution for Messenger — the HSM fleet's public keys ship inline with the HSM response inside a validation bundle signed by Cloudflare + counter-signed by Meta, with Cloudflare maintaining an audit log of every validation bundle — canonical wiki instance of Cloudflare as third-party cosigner + audit-log operator in another company's trust chain. Canonical patterns/third-party-countersignature-for-trust-anchor
-
concepts/audit-log-as-transparency-artifact instance. (2) Transparent fleet deployment — Meta commits to publishing reproducible evidence of each new HSM-fleet rollover. WhatsApp continues to hardcode fleet keys in-app; Messenger uses the OTA bundle because its product constraints require new HSM fleets to deploy without app updates. The Cloudflare role here is structurally different from Cloudflare-as-first-party-substrate (Workers / AI Gateway / etc.) — see the new section As third-party cosigner + audit-log operator for other companies above. Full protocol in Meta's Security of E2EE Backups whitepaper.
-
sources/2026-05-01-cloudflare-code-orange-fail-small-complete — 2026-05-01 Code Orange: Fail Small is complete. The result is a stronger Cloudflare network. Cloudflare announces the completion of Code Orange, the ~6-month engineering-resiliency programme launched in response to the 2025-11-18 and 2025-12-05 global outages. The post is the follow-up structural RCA — rather than re-describing the bugs, it catalogs the shipped remediations and the enforcement mechanisms that will prevent the class of bug recurring. Five tracks: (1) safer configuration changes via a new internal system Snapstone that bundles config changes into packages + applies health-mediated progressive rollout + automated rollback by default; teams "create these configuration units on demand" so any identified dangerous pattern inherits safe deployment — "the same methodology we use when releasing software, for all configuration deployments"; (2) reducing the impact of failure via systematic review of runtime dependencies + fail-stale (use last-known-good configuration) / fail-open / fail-closed per module + customer-cohort-segmented service instances (patterns/customer-cohort-segmented-service-instances) on the Workers runtime with "one handling only traffic for our free customers", least-critical-first deployment ordering, and 50+ deploys in 7 days operational datum; (3) revised break-glass and incident-management procedures with backup authorisation pathways for 18 key services + a dedicated communications team + a 2026-04-07 engineering-wide drill with 200+ team members — canonicalising "Cloudflare runs on Cloudflare" as a structural hazard and drill muscle memory as the readiness discipline; (4) the internal Codex (systems/cloudflare-codex) — a living repository of engineering rules authored via the RFC process (concepts/rfc-as-codified-engineering-rule) in the format "If you need X, use Y" with link back to the RFC; enforced via AI code review on every MR across the entire codebase without exception via patterns/codex-enforced-via-ai-code-review; two named rules with direct incident-origin traceability — "Do not use
.unwrap()outside of tests andbuild.rs" (addresses concepts/unhandled-rust-panic, would have rejected the 2025-11-18 FL2 MR) and "Services MUST validate that upstream dependencies are in an expected state before processing" (addresses concepts/internally-generated-untrusted-input + concepts/nil-index-lua-bug, would have rejected both 2025-11-18 and 2025-12-05 MRs); framed as a flywheel — "expertise becomes standards, standards become enforcement, enforcement raises the floor for everyone"; (5) tightened external communication via global changelog + SLOs + a maintenance-coordination system + predictable-interval (30-60 min) customer updates during active incidents, with a dedicated comms team that drilled alongside responders. Thesis framing: "build institutional memory that enforces itself" (concepts/institutional-memory). Evaluation claim: "we have now completed the work that would have avoided the November 18, 2025 and December 5, 2025 global outages." Explicitly complete but not final: "improving resiliency will never be a 'job done'." The 2026-04-20 AI Code Review post was already framed "as part of Code Orange: Fail Small" — that system is the enforcement substrate; the 2026-05-01 post canonicalises the Codex as the rule- source tier on top of it. Recurring-shape completion: this post closes the "name the missing discipline, then ship the remediation, then publish the evaluation" arc that the 2025-07-14, 2025-11-18, and 2025-12-05 post-mortems opened. Three stated-remediation patterns (patterns/progressive-configuration-rollout, patterns/global-feature-killswitch, patterns/harden-ingestion-of-internal-config) move from stated to shipped status; the concepts/fail-open-vs-fail-closed binary is extended to a ternary with fail-stale preferred. Two new Key systems surfaced (systems/snapstone, systems/cloudflare-codex); 7 new concepts (concepts/health-mediated-deployment, concepts/fail-stale, concepts/traffic-cohort-segmentation, concepts/rfc-as-codified-engineering-rule, concepts/dependency-on-self, concepts/drill-muscle-memory, concepts/institutional-memory); 3 new patterns (patterns/config-deployment-as-code-deployment, patterns/customer-cohort-segmented-service-instances, patterns/codex-enforced-via-ai-code-review). - sources/2026-05-01-cloudflare-introducing-dynamic-workflows-durable-execution-that-follows-the-tenant
— 2026-05-01 Introducing Dynamic Workflows: durable
execution that follows the tenant. Cloudflare ships
@cloudflare/dynamic-workflows— a ~300-line MIT-licensed TypeScript library that bridges Workflows (durable execution) with Dynamic Workers (per-request V8-isolate sandboxing). Until this launch, a Workflow'sclass_nameinwrangler.jsoncwas statically bound to a single class per deploy — "one binding, one class, per deploy" — which worked for single-owner apps but broke the moment a platform wanted each tenant / repo / agent to ship their ownrun(event, step)function. Dynamic Workflows lifts the constraint: a single Worker Loader dispatches everycreate()call and every subsequentrun(event, step)invocation into the right tenant's code, and the Workflows engine's durability machinery (IDs,step.sleep(),step.waitForEvent(), retries, hibernation, replay) continues to work unchanged. The library is envelope-and-unwrap glue:wrapWorkflowBinding({ tenantId })rewrites outbound.create()payloads into{ __workerLoaderMetadata: { tenantId }, params: <original> }, the engine persists the envelope with the payload through every sleep/crash/redeploy, andcreateDynamicWorkflowEntrypointunwraps onrun()and hands the metadata to theloadRunnercallback the platform wrote — which typically loads the tenant's code from R2 / Artifacts and returns the Dynamic Worker stub. Bindings that cross the Dynamic Worker boundary must be RPC stubs — "a plain{ create, get }object can't be structured-cloned, and the rawWorkflowbinding isn't serializable either" — so the Worker Loader mustexport { DynamicWorkflowBinding }for the runtime to build per-tenant specialised stubs. Canonicalises concepts/per-tenant-dynamic-code-dispatch (three-layer engine / Worker Loader / tenant-code topology), concepts/envelope-wrap-and-unwrap-metadata-routing (wire-format trick that survives persistence), and concepts/byo-workflow-per-tenant (customer-ships-the- workflow-body mental model). Workflows V2 capacity disclosure (first wiki instance): 50,000 concurrent workflow instances per account, 300 new instances per second per account. Dynamic Worker dispatch hop: single- digit-millisecond boot, a few MB of memory, cache-by-tenant- ID, evicted-isolate reload invisible to the workflow — "dispatch overhead is essentially free". Metadata is a routing hint, not authorization: "the tenant can read it back viainstance.status(). Don't put secrets in there" — isolation remains the capability-based sandbox's responsibility via Dynamic Workers'globalOutbound: nullposture. Positioning: one of three dynamic-binding primitives. Cloudflare frames Dynamic Workflows as the third realised instance of a generalising pattern after Dynamic Workers (compute, open-beta 2026-04) and Durable Object Facets (storage, 2026-04); pre-announces the same shape for queues, caches, databases, object stores, AI bindings, and MCP servers. Canonicalises patterns/dynamic-binding-over-static-binding as the deliberate platform strategy Cloudflare is rolling out. Canonical CI/CD showcase — each repo ships.cloudflare/ci.tswith its ownCIPipeline extends WorkflowEntrypointclass; the platform's webhook dispatcher loads that repo's pipeline as a Dynamic Worker and hands execution to Dynamic Workflows; Artifacts + ArtifactFSfork()gives each run an isolated repo copy in single-digit seconds (replacing the 10-secondgit clonetax); Dynamic Workers run each lightweight step (lint, test, build) in ms-boot isolates; Sandboxes handle heavy corners (docker build, integration suites with Postgres, Rust 8-core compiles). "The repo doesn't move — the compute comes to it." Canonicalised as patterns/ci-pipeline-as-customer-authored-durable-workflow. Agent-autonomy composition: Dynamic Workflows is "the piece that lets that plan be a first-class Cloudflare Workflow" for coding agents — a model-writtenrun(event, step)body with everystep.do()independently retryable, everystep.sleep('24 hours')hibernating for free, everystep.waitForEvent(...)waiting indefinitely for human approval. Extends Project Think's fiber-based durable execution model with a workflow-as- artifact primitive the model itself can emit. New wiki pages: 2 systems (systems/cloudflare-dynamic-workflows, systems/durable-object-facets), 3 concepts (concepts/per-tenant-dynamic-code-dispatch, concepts/envelope-wrap-and-unwrap-metadata-routing, concepts/byo-workflow-per-tenant), 3 patterns (patterns/dynamic-binding-over-static-binding, patterns/metadata-envelope-in-durable-payload, patterns/ci-pipeline-as-customer-authored-durable-workflow). Extends 9+ existing pages (systems/cloudflare-workflows with V2 capacity + dispatched role; systems/dynamic-workers with dispatch-substrate role; systems/cloudflare-durable-objects with Facets sibling; systems/cloudflare-workers with Worker Loader role; systems/cloudflare-artifacts, systems/artifact-fs, systems/cloudflare-sandbox-sdk with CI showcase roles; systems/project-think with workflow-as-agent-plan composition; concepts/durable-execution with fourth shape — platform-hosted engine + dispatched per-tenant code). Eighth post in Cloudflare's 2026-04 → 2026-05 durable-execution + agent-autonomy arc (Project Think + Agent Lee + AI Search + Artifacts + Email Service + AI Platform + agent-provisioning protocol + Dynamic Workflows). - sources/2026-05-27-cloudflare-irans-internet-is-partially-restored-cloudflare-radar-data-shows — 2026-05-27 Iran's Internet is partially restored, Cloudflare Radar data shows. Direct duration-update + first-recovery follow-up to the Q1 2026 disruption review. Iran's February 28 2026 nationwide shutdown — implemented via aggressive filtering + white SIM cards — reached 87 days before its first partial restoration on May 26 2026, qualifying as "one of the longest sustained Internet disruptions observed in recent years." Cloudflare Radar confirmed the recovery via four signal planes: bytes transferred (~15× prior-week levels), 1.1.1.1 DNS query rate, per-ASN breakdown across TCI / IranCell / RighTel / MCCI, and per-region geolocation. The recovery shape was diagnostically important — 91.6% of restored HTTP requests originated from Tehran, the rest of the country showed only minor increases. Per-region selective recovery is only producible by a filtering-based architecture (a route-withdrawal recovery is uniform across all in-country ASes within minutes), so the Tehran-localised shape confirms the original mechanism diagnosis in the recovery direction — sibling to the "routes stay up, traffic drops" signature in the shutdown direction. Establishes concepts/capital-localized-internet-restoration as the new wiki canonical concept (Tehran 91.6% as canonical instance) and extends concepts/partial-restoration-pattern with the multi-month + capital-localised recovery shape variant (distinct from Uganda 2026's 9-day phased recovery). concepts/diurnal-traffic-pattern returned within hours (decline ~21:00 UTC, return ~03:00 UTC at 06:30 local Tehran), confirming organic user activity rather than synthetic / state-only traffic; first wiki canonicalisation of the diurnal-pattern concept. IPv6 announced address space remains effectively zero ~5 months after the January 8 IPv6 withdrawal — Asiatech (~9.4% of Iran's IPv6 space) and RASANA (~8.8%) continue to carry the missing prefixes; IPv4 recovery and IPv6 posture have decoupled, establishing the persistent IPv6 withdrawal pattern in concepts/ipv6-withdrawal-as-shutdown-signal. Peak traffic reached only ~40% of 2026 maximum at restoration onset. Cloudflare warned the recovery could be temporary — January precedent had two reversed restorations on January 21 + 25 before durable recovery January 27. Opens 4 new minimum- viable system pages (TCI, IranCell, RighTel, MCCI) and 2 new concept pages (capital-localized-internet-restoration, diurnal-traffic-pattern); extends 5 existing concept pages (partial-restoration-pattern, government-directed-internet- shutdown, filtering-based-shutdown, whitelist-internet-access, ipv6-withdrawal-as-shutdown-signal) and 5 existing system pages (cloudflare-radar, cloudflare-radar-outage-center, cloudflare-1-1-1-1-resolver, asiatech-as43754, rasana-as31549) with the May 26 partial-restoration data point.
- sources/2026-04-28-cloudflare-q1-2026-internet-disruption-summary — 2026-04-28 Q1 2026 Internet disruption review: shutdowns, power outages, and conflict. Quarterly Cloudflare Radar summary of observed and confirmed Q1 2026 Internet disruptions, dominated by three classes of event. (1) Government-directed shutdowns inverted a one-year trend — Q1 2025 had none, Q1 2026 had three: Uganda (nationwide mobile-operator suspension from 18:00 local January 13 ahead of the January 15 election; UIXP domestic traffic ~72 Gbps → ~1 Gbps; full restoration January 26); Iran (first shutdown January 8–27, second shutdown from February 28 still largely in place at quarter- end — "one of the longest sustained Internet disruptions observed in recent years"); Republic of Congo (~60-hour blackout around the March 15 presidential election, matching 2016 + 2021 election shutdowns). The Iran second shutdown is the canonical wiki instance of filtering-based shutdown
- whitelist Internet access ("white SIM cards" restricting access to approved users/sites); the first shutdown produced the canonical IPv6-withdrawal as shutdown signal — Asiatech (AS43754) lost 4.46 M /48-equivalents (~9.4% of Iran's IPv6 space), RASANA (AS31549) lost 4.19 M (~8.8%). (2) First publicly-disclosed kinetic strikes on hyperscaler cloud infrastructure — drones hit two AWS me-central-1 (UAE) facilities March 1 and damaged an me-south-1 (Bahrain) facility March 2, with a second me-south-1 disruption March 23. Cloud Observatory data showed elevated connection failure rates for multiple days. The AWS Health Dashboard urged customers with workloads in affected regions to back up data or migrate to other AWS regions — canonical instance of patterns/cloud-region-migration-during-conflict and kinetic attack on cloud infrastructure. (3) Cuba's severely deteriorating grid produced three separate national grid collapses in a single month (March 4, March 16, March 21–22), plus Ukrainian + Moldovan power outages driven by Russian strikes on energy infrastructure (January 7–8 Dnipropetrovsk ~50% drop; January 26 Kharkiv ~50% drop; January 31 cross-border Romania/Moldova/Ukraine grid failure ~46% drop). Other events: Storm Kristin in Portugal (~70% drop in Leiria, >850k E-Redes customers without electricity); WACS (West Africa Cable System) incident in Republic of Congo (~82% drop); Verizon Wireless software issue across the U.S.; Flow Grenada, Orange Guinée, TalkTalk smaller disruptions. Redundancy asymmetry canonicalised by the U.S. Virgin Islands case — VI Powernet (AS14434) went near- zero but St. Thomas only lost ~60% of traffic and St. Croix ~40% due to other providers (patterns/redundant-isp-absorption-of-partial-outage). Canonical wiki instance of patterns/quarterly-internet-disruption-review sibling to the prior Q1 2025 / Q3 2025 / Q4 2024 Cloudflare Radar reviews.
- sources/2026-04-30-cloudflare-post-quantum-encryption-for-cloudflare-ipsec-is-ga — 2026-04-30 Post-quantum encryption for Cloudflare IPsec generally available. Cloudflare GA's hybrid ML-KEM for Cloudflare IPsec / Magic WAN per draft-ietf-ipsecme-ikev2-mlkem — classical Diffie-Hellman runs first, its derived key encrypts a second exchange running ML-KEM (NIST FIPS 203), and both outputs mix into the session keys securing IPsec ESP traffic. Interoperability verified with Cisco 8000 Series Secure Routers 26.1.1+ and Fortinet FortiOS 7.6.6+ branch- connector hardware; reference implementation via strongswan. Not yet interoperable: Palo Alto Networks shipped early under RFC 9370 with vendor-specific ciphersuites before draft-ietf-ipsecme-ikev2-mlkem existed — canonical ciphersuite-bloat case study. Motivating threat is HNDL on site-to-site WAN traffic; this rollout closes the IPsec HNDL gap four years after TLS adopted the same hybrid construction (2022). The delay is explicitly attributed to IPsec-community QKD distraction (RFC 8784, 2020) + the extension-mechanism-without-named-default failure of RFC 9370 (2023). Third canonical instance of default-on security upgrade at no additional cost (after 2014 Universal SSL + 2022 TLS PQ-for-all) and canonical sibling-protocol instance of TLS-first PQC rollout as blueprint (alongside SSH's 2025 GitHub deployment). Remaining gap: post-quantum authentication for IPsec is not yet specified in IETF drafts — Cloudflare's closing standards-community ask: "we hope the IPsec community will continue to focus on interoperable PQC implementations, rather than diverting focus to niche use cases with QKD." Opens 8 new canonical wiki pages (1 source + 3 concepts + 2 patterns + 5 systems) and extends the concepts/post-quantum-cryptography canon from TLS (2022) + SSH (2025) to IPsec (2026).
-
sources/2026-04-30-cloudflare-agents-can-now-create-cloudflare-accounts-buy-domains-and-deploy — 2026-04-30 Cloudflare + Stripe agent-provisioning protocol launch. Cloudflare + Stripe co-announce a new protocol letting an agent take a user from literal zero (no Cloudflare account, no domain, no API token) to a production deployment on a newly-registered domain, without the user leaving the agent session. Three co-operating components: discovery (
stripe projects catalogaggregates provider-side JSON catalogs — Cloudflare exposes its ~3,000-operation API surface via this), authorization (OAuth consent for existing accounts; auto-provisioned accounts for new users via Stripe-attested identity + RFC 9728- aligned consent for existing), payment (payment tokens not raw card data; default $100/month-per-provider cap via Stripe Projects; paired with Cloudflare's Budget Alerts observability surface). Canonical example:stripe projects add cloudflare/registrar:domainregisters a new domain via Cloudflare Registrar + returns an API token for agent deployment on Workers. Framing: "similar to how the OAuth standard made it possible to delegate access to your account to other platforms, the protocol uses OAuth and extends further into payments and account creation, doing so in a way that treats agents as a first-class concern." Any signed-in-user platform can play the orchestrator role Stripe plays at launch — the patterns/orchestrator-provider-agent-trust-triangle is deliberately symmetric, and Cloudflare itself already plays orchestrator for PlanetScale-Postgres provisioning (the partner-managed binding pattern is the pre-formal- protocol ancestor). Also announced: $100k in Cloudflare credits for startups incorporating via Stripe Atlas. A formal specification is forthcoming. Wiki pages created in this pass (8): 2 systems (systems/stripe-projects, systems/cloudflare-registrar), 4 concepts (concepts/agent-provisioned-account, concepts/provider-service-catalog-api, concepts/agent-payment-budget-cap, concepts/payment-token-over-credit-card-sharing), 2 patterns (patterns/agent-provisioning-protocol, patterns/orchestrator-provider-agent-trust-triangle). -
sources/2026-04-22-cloudflare-making-rust-workers-reliable-panic-and-abort-recovery-in-wasm-bindgen — 2026-04-22 Rust Workers reliability post. Cloudflare documents the two-generation arc for containing sandbox poisoning of Rust Workers (Rust compiled to WebAssembly via wasm-bindgen). Before: a Rust panic or out-of-memory abort left the Wasm instance in undefined state — "an unhandled Rust abort in a Worker affecting one request might escalate into a broader failure affecting sibling requests or even continue to affect new incoming requests." Platform-side interim (workers-rs 0.6): a Rust panic handler + JS-side Proxy-based entrypoint encapsulation + targeted wasm-bindgen binding patches to reinitialise the Wasm module. Upstream fix (workers-rs 0.8.0 via wasm-bindgen): two primitives —
panic=unwindsupport forwasm32-unknown-unknownimplemented on top of the WebAssembly Exception Handling proposal, so Rust panics unwind cleanly across the Rust↔JS boundary with destructors running + instance remaining valid and reusable; and an abort recovery primitive (Exception.Tagfor foreign exceptions +set_on_aborthook) letting embedders detect genuine aborts and reinitialise as last-resort remediation. Closures:MaybeUnwindSafe+Closure::new_abortingas the correctness-preserving fallback for captured state that can't be proven unwind-safe. Ecosystem prerequisite work: Cloudflare backported modern Wasm EH to Node.js 22 -
24 (natively in 25.0.0, 2025-10-15) so Rust's Wasm targets can move to modern EH as the default next year — the laggard in an otherwise-aligned engine matrix (V8 13.8.1, workerd v1.20250620.0, Chrome 138, Firefox 131, Safari 18.4). Also shipped experimental
--reset-state-functionin wasm-bindgen to extend abort recovery to JS apps embedding wasm-bindgen libraries. Cloudflare is now part of the wasm-bindgen organization formed in 2025 — the co-maintainer role that made the upstream route viable. Wiki pages created in this pass (11): 3 systems (systems/workers-rs, systems/wasm-bindgen, systems/webassembly), 6 concepts (concepts/panic-unwind, concepts/panic-abort, concepts/webassembly-exception-handling, concepts/sandbox-poisoning, concepts/unwind-safety, concepts/abort-recovery), 2 patterns (patterns/proxy-based-entrypoint-encapsulation, patterns/reinitialize-on-unrecoverable-error), plus extension of concepts/stack-unwinding to a Rust-on-Wasm seen-in and a new seventh-instance section added to patterns/upstream-the-fix. Canonical cross-substrate pairing between native-arm64 Go unwinder bugs (the 2025-10 Cloudflare Go compiler incident) and Rust-on-Wasm unwinder semantics — same runtime discipline (concepts/stack-unwinding), different substrate. Tier 1, 2026-04-22. -
sources/2026-04-21-planetscale-faster-planetscale-postgres-connections-with-cloudflare-hyperdrive — 2026-02-19 PlanetScale × Cloudflare Hyperdrive real-time demo (external-ingested via PlanetScale blog). Simeon Griggs walks through a prediction-market demo on PlanetScale Postgres Metal + full Cloudflare stack (Workers + Hyperdrive + Durable Objects + WebSockets). Four load-bearing architectural decisions made explicit: (1) DO-not-on-write-path — "Durable Objects are single-threaded and hosted in a single location, making them a bad candidate for the write path" — canonical wiki datapoint for the DO's role split (patterns/single-region-do-fanout-from-distributed-writers); (2) Postgres authoritative, WebSockets fast — DB is source of truth, WebSocket is best-effort notify, client contract is "immediate most of the time, eventually correct all of the time" (canonical concepts/authoritative-vs-fast-notification + patterns/db-authoritative-with-websocket-notify); (3) Smart placement deliberately NOT used — "I've chosen not to use smart placement and keep the Worker closer to every user" — canonical wiki "when NOT to place" datapoint because Hyperdrive already absorbs edge-to-origin DB latency; (4) Stale-quote rejection at DB level — request carries expected price + slippage, DB rejects if stale at write time (canonical concepts/stale-quote-rejection). Hyperdrive decomposed into two physical tiers for the first time on the wiki: edge component at every Cloudflare POP pre-negotiates "the 7 round-trip steps of creating a connection"; origin-co-located pool maintains warm connections. Three new wiki entities created for this ingest: systems/cloudflare-websockets, concepts/authoritative-vs-fast-notification, concepts/stale-quote-rejection, patterns/db-authoritative-with-websocket-notify, patterns/single-region-do-fanout-from-distributed-writers. Production-hardening deferrals itemised: replay-on-reconnect, Cloudflare Queues for fan-out durability, polling reconciliation. Single-DO scaling ceiling flagged, named mitigation: horizontal DO sharding by key. Multi-environment pattern via
wrangler.jsoncenvblock + per-branch PlanetScale DB +CLOUDFLARE_ENV=development. Customer- facing companion to the 2026-04-16 Cloudflare-side PlanetScale-integration launch post. -
sources/2026-04-01-cloudflare-emdash-wordpress-spiritual-successor — 2026-04-01 Introducing EmDash: a spiritual successor to WordPress that solves plugin security. Cloudflare announces EmDash (v0.1.0 preview), an open-source MIT-licensed TypeScript CMS positioned as the "spiritual successor to WordPress." Core architectural bet: run each plugin as a Dynamic Worker with a capability manifest declaring exactly what hooks and capabilities it needs. Extends the capability-based sandbox primitive (canonically introduced in Project Think for LLM-generated code) to third-party CMS plugins — same *"no ambient authority + declare + grant
-
runtime-enforce-at-boundary" shape. Addresses WordPress's two structural problems: (1) plugin security — "WordPress plugins have direct access to the WordPress site's database and filesystem"; 96% of WordPress security issues originate in plugins, 2025 had more high-severity WordPress vulnerabilities than the prior two years combined (Patchstack 2025); (2) marketplace lock-in — WordPress.org's 800+ plugin review queue + ≥2 week wait + GPL-inheritance argument combine to force plugin authors into the marketplace. EmDash's sandbox primitive breaks both: "plugins can have any license; a plugin can be provided to an EmDash site, and trusted, without the EmDash site ever seeing the code" — canonical patterns/license-independent-sandboxed-plugins. Restaurant-food-safety analogy extracted verbatim as the argumentative backbone. Serverless by default: built for workerd V8 isolates, scales to zero, CPU-time-only billing; contrasts with WordPress's "can't avoid pre-provisioning idle compute" posture. Platform operators can host millions of EmDash sites via Cloudflare for Platforms; deploys to any Node.js server too. Built-in x402 / HTTP 402 — every EmDash site ships per-request content monetisation out of the box (agentic paywall shipped as a CMS default; third wiki deployment of HTTP 402 after pay-per-crawl and the x402 Foundation). AI-native management surfaces: each EmDash instance ships a built-in MCP server + Agent Skills (plugin-authoring + WordPress-theme-porting skills in-tree) + a programmatic CLI — first wiki instance of Agent Skills shipped inside a product, first wiki instance of MCP as a CMS primitive. Astro is the theming framework (themes cannot do database operations — contrast with WordPress's
functions.php); passkeys are the default authentication. WordPress migration: WXR export + EmDash Exporter plugin with secure Application-Password-protected endpoint. "No WordPress code was used to create EmDash" — clean reimplementation enabling MIT license, sibling shape to vinext's Next.js clean reimplementation. Wiki pages created (10)*: sources/2026-04-01-cloudflare-emdash-wordpress-spiritual-successor; 4 systems (systems/emdash, systems/wordpress, systems/workerd, systems/x402-protocol); 3 concepts (concepts/capability-manifest, concepts/plugin-marketplace-lock-in, concepts/passkey-authentication); 3 patterns (patterns/capability-manifest-plugin-isolation, patterns/license-independent-sandboxed-plugins, patterns/per-request-isolate-per-plugin). Extends systems/astro (theming role), systems/dynamic-workers (plugin-hosting use case), systems/cloudflare-workers (CMS host), systems/model-context-protocol (CMS primitive), systems/agent-skills (shipped-in-product), concepts/capability-based-sandbox (plugin tier), concepts/http-402-payment-required (CMS primitive), concepts/scale-to-zero (CMS-as-serverless-workload). -
sources/2026-04-17-cloudflare-agents-that-remember-introducing-agent-memory — 2026-04-17 Agents that remember: introducing Agent Memory. Cloudflare launches Agent Memory (private beta) — an opinionated managed service that extracts information from agent conversations at compaction time, stores it as classified memories outside the context window, and retrieves it via a five-channel parallel fusion pipeline. Forcing function is context rot: "even as context window sizes grow past one million (1M) tokens, context rot remains an unsolved problem. A natural tension emerges between two bad options: keep everything in context and watch quality degrade, or aggressively prune and risk losing information the agent needs later." Memory preservation at compaction is the third option. Six-operation deliberately narrow API —
getProfile/ingest(bulk harness path) /remember/recall/forget/list(model tools) — canonical instance of patterns/constrained-memory-api: "the primary agent should never burn context on storage strategy. The tool surface it sees is deliberately constrained so that memory stays out of the way of the actual task." Architecturally: a Cloudflare Worker coordinating a per-profile Durable Object (raw messages + classified memories in SQLite + FTS + supersession chains) + a per-profile Vectorize index + Workers AI models — Llama 4 Scout (17B, 16-expert MoE) for extraction / verification / classification / query analysis, Nemotron 3 (120B MoE, 12B active) for synthesis. "The synthesizer is the only stage where throwing more parameters at the problem consistently helped. For everything else, the smaller model hit a better sweet spot of cost, quality, and latency." Ingest pipeline (canonical patterns/multi-stage-extraction-pipeline): (1) content-addressed ID =SHA-256(sessionId + role + content)[:128 bits](concepts/content-addressed-id), (2) extractor runs two parallel passes — a full pass chunking at ~10K chars with 2-message overlap (4 chunks concurrent) + a detail pass on conversations ≥9 messages with overlapping windows for names / prices / version numbers, (3) verifier with 8 checks (entity / object / location / temporal / organisational / completeness / relational / supported-by-transcript), (4) classifier into 4 types — facts (atomic, stable, keyed, vector-indexed), events (timestamped, vector-indexed), instructions (procedural, keyed, vector-indexed), tasks (ephemeral, FTS-only, NOT vector-indexed), (5) storage viaINSERT OR IGNORE+ supersession chains for keyed types (old → new forward pointer), (6) return response to harness, (7) async background vectorisation that prepends the 3-5 classifier-generated search queries to the memory content before embedding, bridging declarative writes ("user prefers dark mode") with interrogative reads ("what theme does the user want?"). Retrieval pipeline (canonical patterns/parallel-retrieval-fusion): query analyser + raw-query embedding run concurrently; five channels in parallel — (a) FTS with Porter stemming, (b) exact fact-key lookup, (c) raw-message FTS (safety net), (d) direct vector search, (e) HyDE vector search (answer-shaped statement); RRF fuses with channel-specific weights (fact-key highest = strongest signal, raw-message lowest = safety net), ties broken by recency; top candidates synthesised by Nemotron 3 with temporal computation handled deterministically via regex + arithmetic, not LLM ("models are unreliable at things like date math, so we don't ask them to do it"). Per-profile isolation via DOgetByName()— "sensitive memories are strongly isolated from other tenants" — realises concepts/one-to-one-agent-instance at the memory tier; session-affinity header routes all AI calls to memory-profile name for prompt caching benefit (samex-session-affinitywire primitive previously seen in the XL-LLM post, now keyed on memory-profile rather than user session). Storage stratifies across DO SQLite (content + FTS + supersession), Vectorize (vectors), R2 (future snapshots -
exports) — "each primitive is purpose-built for its workload, we don't need to force everything into a single shape or database." Product iteration via patterns/agent-driven-benchmark-loop against LongMemEval, LoCoMo, BEAM — "have a human review the proposals to select strategies that generalize rather than overfit, let the agent make the changes, repeat"; stochasticity even at temperature=0 handled by multi-run averaging + trend analysis. Three internal dogfood workloads: coding-agent memory via OpenCode plugin (shared-team-memory lift: "the agent knows what other members of your team have already learned"); agentic code reviewer ("arguably the most useful thing it learned to do was stay quiet"); message-history chat bot. Delivery velocity: "first prototype in a weekend, productionised internal version in less than a month" — Cloudflare-on-Cloudflare composition credit. Memory vs search distinction made explicit: "AI Search is our primitive for finding results across unstructured and structured files; Agent Memory is for context recall. The data in Agent Memory doesn't exist as files; it's derived from sessions" — fifth substrate of agent memory in the 2026-04 Cloudflare arc alongside Project Think Persistent Sessions, AI Search per-customer instances, Artifacts per-session Git repos, Email Service thread-plus-DO-state. Exportability as first-class strategic posture: "Agent Memory is a managed service, but your data is yours. Every memory is exportable, and we're committed to making sure the knowledge your agents accumulate on Cloudflare can leave with you if your needs change. We think the right way to earn long-term trust is to make leaving easy and to keep building something good enough that you don't want to." Introduces a new recurring shape on Cloudflare: retrieval-based managed memory service as the superior alternative to raw-filesystem / raw-DB access for agents — "tighter ingestion and retrieval pipelines are superior to giving agents raw filesystem access. In addition to improved cost and performance, they provide a better foundation for complex reasoning tasks required in production, like temporal logic, supersession, and instruction following." Wiki pages created in this pass (5): 5 patterns — patterns/constrained-memory-api, patterns/multi-stage-extraction-pipeline, patterns/parallel-retrieval-fusion, patterns/agent-driven-benchmark-loop, patterns/agent-first-storage-primitive (pattern form of the existing concept). Wiki pages from prior partial-session work (comprehensive and pre-existing — source page + 5 concepts + 1 system): sources/2026-04-17-cloudflare-agents-that-remember-introducing-agent-memory, systems/cloudflare-agent-memory, concepts/memory-compaction, concepts/memory-supersession, concepts/hyde-embedding, concepts/content-addressed-id, plus reuse of concepts/agent-memory / concepts/context-rot / concepts/reciprocal-rank-fusion / concepts/hybrid-retrieval-bm25-vectors / concepts/session-affinity-prompt-caching / concepts/one-to-one-agent-instance / concepts/agent-first-storage-primitive / patterns/tool-surface-minimization. Sibling 2026-04 launches (Agents Week): Project Think, AI Search, Artifacts, Email Service, Building the foundation for XL LLMs. Scope disposition: on-scope Tier 1 — retrieval-based memory primitive with full multi-stage extraction pipeline + multi-channel RRF retrieval + per-profile DO + Vectorize substrate + concrete operational numbers (chunk sizes, concurrency, truncation bits, retrieval channel count, benchmark stack, delivery velocity). No existing-claim contradictions; extends concepts/agent-memory with the fifth substrate instance in the 2026-04 arc. (Source: sources/2026-04-17-cloudflare-agents-that-remember-introducing-agent-memory)
-
sources/2026-04-17-cloudflare-introducing-flagship-feature-flags-built-for-the-age-of-ai — 2026-04-17 Introducing Flagship: feature flags built for the age of AI. Cloudflare launches Flagship — native feature-flag service in private beta, built on OpenFeature (the CNCF open standard; framed as "the same relationship that OpenTelemetry has to observability") and on Cloudflare's own developer-platform primitives (Workers, Durable Objects, Workers KV). Architecture (direct from the post): control-plane write lands atomically on a per-app Durable Object — "a SQLite- backed, globally unique instance that serves as the source of truth for that app's flag configuration and changelog" — syncs "within seconds" to Workers KV which replicates globally; at request time the evaluation engine runs inside the same V8 isolate already handling the user request, reads flag config from edge-local KV, matches context against rules, resolves any percentage rollout, returns a variation — "both the data and the logic live at the edge — nothing is sent elsewhere to be evaluated". Canonical wiki instance of patterns/do-plus-kv-edge-config-distribution and patterns/in-isolate-rule-evaluation. Worker binding (
env.FLAGS) onwrangler.jsonc: { flagship: [{ binding, app_id }] }: typed accessors for all four variation types (boolean / string / number / full JSON object — the last collapses the "flag vs. config store" distinction on the edge, one distribution primitive covers both) plus*Details()variants returning value + matched variant + reason (e.g.TARGETING_MATCH). Failure semantics: evaluation errors return the caller-supplied default gracefully; type mismatches throw — "that's a bug in your code, not a transient failure" (load-bearing because silently coerced JSON flags would corrupt downstream). OpenFeature SDK works across Workers, Node.js, Bun, Deno, and the browser (client-side provider pre-fetches flags + caches with configurable TTL). Rule model: conditions (implicit-AND at top level; nested AND/OR groups up to 5 levels deep) + variation + optional percentage rollout + priority (lower = higher priority, first-match wins). Percentage rollouts distinguished from Workers gradual deployments: "Unlike gradual deployments, which split traffic between different uploaded versions of your Worker, feature flags let you roll out behavior by percentage within a single version that is serving 100% of traffic." Rollouts use consistent hashing on a context attribute (userIdcanonical) so a given attribute value always hashes to the same bucket — "you can ramp from 5% to 10% to 50% to 100% of users, so those who were already in the rollout stay in it" — monotonic bucket growth is the ramp-safety guarantee. Thesis framing: the lede positions flags as the safety net for agent-shipped code, not another feature-team tool — "AI is writing more code than ever… Tomorrow, the agent does all of that itself. The question becomes: how do you let an agent ship to production without removing every safety net? Feature flags are the answer." — the agent writes a code path behind a flag, ships inert, self-enrolls into a small cohort, observes metrics, ramps or reverts; the human "sets the boundaries, and the flag controls the blast radius." Canonical wiki instance of concepts/agent-controlled-deployment. Production- surface claims: evaluation across region Earth cached globally via KV; full audit trail with field-level diffs ("you know who changed what and when") positioned against hardcoded-flaggit blameanti-pattern; dashboard integration (anyone can toggle without touching code); OpenFeature compatibility — "Adopt Flagship without rewriting your evaluation code. Leave without rewriting it either." Named anti-patterns on Workers: (1) hardcoded flag logic — "one hardcoded flag becomes ten. Ten becomes fifty, owned by different teams, with no central view of what's on or off. There's no audit trail." (2) remote HTTP evaluation — "that outbound request sits on the critical path of every single user request". (3) local-evaluation SDK — breaks on Workers because "a Worker isolate can be created, serve a request, and be evicted between one request and the next. A new invocation could mean re-initializing the SDK from scratch." The post names KV as the correct "distribution primitive that's already at the edge, one where the caching is managed for you, reads are local, and you don't need a persistent connection to keep things up to date." Wiki instances introduced: 8 new entity pages — 6 concepts (concepts/feature-flag, concepts/percentage-rollout, concepts/vendor-neutral-evaluation-api, concepts/agent-controlled-deployment, concepts/consistent-hashing, concepts/audit-trail) + 2 patterns (patterns/do-plus-kv-edge-config-distribution, patterns/in-isolate-rule-evaluation); 2 pre-existing entity pages — systems/cloudflare-flagship (product page) + systems/openfeature (standard page). Sixth 2026-04 Cloudflare launch whose load-bearing primitive is "one DO per caller-identified unit" alongside Agent Lee, Project Think, AI Search, Artifacts, and Email Service — but the first where the DO is explicitly paired with KV for global read replication of config. Source code public:github.com/cloudflare/flagship. SDK:npm i @cloudflare/flagship. Availability: private beta, no GA pricing. Caveats: no production QPS / latency numbers / DO-→-KV propagation p99 disclosed; no flag-count / rule-size / JSON-object-size limits published; rule evaluation on hot paths is billable Worker CPU with no ceiling quoted; client-side TTL has no post-supplied default guidance; audit-trail retention not specified; "AI is writing more code" framing is the marketing lede — the architecture (DO + KV + isolate evaluation + OpenFeature provider + consistent-hash rollout) stands on its own regardless of whether the caller is an agent or a human. (Source: sources/2026-04-17-cloudflare-introducing-flagship-feature-flags-built-for-the-age-of-ai) -
sources/2026-02-24-cloudflare-how-we-rebuilt-nextjs-with-ai-in-one-week — 2026-02-24 How we rebuilt Next.js with AI in one week. Sunil Pai + Claude via OpenCode rebuild the Next.js API surface from scratch on Vite as vinext (github.com/cloudflare/vinext) in ~1 week / ~$1,100 in Claude tokens / 800+ OpenCode sessions, achieving 94 % API coverage of Next.js 16 with 1,700+ Vitest + 380 Playwright tests ported from the Next.js repo. Benchmarks on a 33-route App Router fixture: 1.6× / 4.4× faster production builds (Rollup / Rolldown vs. Next.js 16 + Turbopack) and 56 % / 57 % smaller gzipped client bundles. Deploys to Workers with
vinext deploy; ISR viaKVCacheHandlerover KV (patterns/pluggable-cache-handler). Introduces Traffic-aware Pre-Rendering (TPR,--experimental-tpr) — queries Cloudflare zone analytics at deploy time and pre-renders only the URLs covering 90 % of traffic (typically 50-200 pages out of 100 k+ by power law). Resolves build-time-scales-linearly-with-generateStaticParams()without coupling the build to the production database — canonical wiki instance of patterns/traffic-aware-prerendering as a CDN-native, non-portable-to-offline-tools capability. Structurally supersedes OpenNext's adapter-over-Turbopack-output approach (patterns/clean-reimplementation-over-adapter); "Building on top of Next.js output as a foundation has proven to be a difficult and fragile approach … a game of whack-a-mole." Project shape is the canonical wiki instance of an AI-assisted codebase rewrite — the four preconditions (well-specified target API, comprehensive test suite, solid foundation, capable model) + human steering + AI agent guardrails (Vitest / Playwright / tsgo / oxlint + AI review + browser verification via agent-browser). Migration ships as an Agent Skill —cloudflare/vinext— canonical wiki instance of patterns/migration-as-agent-skill. Articulates the thesis that some framework layers exist only as a human-cognition crutch — "AI doesn't have the same limitation. It can hold the whole system in context and just write the code." First production customer National Design Studio's CIO.gov running at publication time. Caveat-heavy: experimental, < 1 week old, static pre-rendering at build time not yet supported (post recommends Astro for 100 %-static sites). Introduces systems/vinext, systems/vite, systems/rolldown, systems/rollup, systems/turbopack, systems/vitejs-plugin-rsc, systems/vinext-kv-cache-handler, systems/vinext-agent-skill, systems/agent-browser, systems/vitest, systems/playwright, systems/tsgo, systems/oxlint, systems/astro; concepts/traffic-aware-prerendering, concepts/power-law-url-traffic, concepts/build-time-scales-with-page-count, concepts/ai-assisted-codebase-rewrite, concepts/well-specified-target-api, concepts/ai-agent-guardrails, concepts/layered-abstraction-as-human-crutch, concepts/incremental-static-regeneration, concepts/cloudflare-zone-analytics; patterns/traffic-aware-prerendering, patterns/clean-reimplementation-over-adapter, patterns/ai-driven-framework-rewrite, patterns/migration-as-agent-skill, patterns/pluggable-cache-handler. Extends systems/nextjs (new wiki instance of the well-specified-target-API property), systems/opennext (structural superseding posture + conformance-suite tests ported into vinext), systems/cloudflare-workers (primary deployment target -
TPR deploy-time analytics query), systems/cloudflare-kv (default ISR cache substrate + TPR pre-render destination), systems/opencode (800+ sessions as a framework-rewrite substrate), systems/claude-code (model capability as the fourth precondition for AI-assisted rewrite), systems/agent-skills (new framework-migration delivery use case). 540 HN points (2026-02-24).
-
sources/2026-01-08-cloudflare-a-closer-look-at-a-bgp-anomaly-in-venezuela — 2026-01-08 A closer look at a BGP anomaly in Venezuela. Forensic-commentary post pushing back on a cybersecurity newsletter's suggestion of BGP "shenanigans" around the 2026-01-02 U.S. capture of Nicolás Maduro. Cloudflare argues the observed route leak by AS8048 (CANTV, Venezuela's state ISP) re-advertising AS21980 (Dayco Telecom) prefixes from AS6762 (Sparkle) to AS52320 (V.tal GlobeNet) is almost certainly accidental: five forensic signals — (1) 11 similar Type-1 hairpin leaks by AS8048 since December 2025; (2) AS8048 is already AS21980's upstream per BGPKIT monocle's
as2relquery (9.4% as1_upstream, 0.0% as2_upstream → high-confidence direction), making the MITM motive structurally absent; (3) leaked paths prepended8048nine times (52320, 8048 x9, 23520, 1299, 269832, 21980) — the opposite of attacker behaviour; (4) announcements spread across ~2h on 01-02 (15:30–17:45 UTC), consistent with convergence issues; (5) leaks start >12 hours before the U.S. military action. Cloudflare's proposed hygiene failure: AS8048's export policy toward AS52320 matched on IRR-derived prefix list only, without a customer BGP community tag — so routes learned indirectly via AS6762 (which were in the customer-cone prefix set but not the direct-customer-session set) slipped through the export filter. Technical distinction pushed hard: origin validation vs path validation. The newsletter's flag on Sparkle's incomplete RPKI ROV deployment is orthogonal — the 01-02 event is a path anomaly (the origin AS21980 was correct); only the path was invalid. ROV wouldn't help. The forward- looking fix: ASPA (IETF SIDROPS draft) providing path validation — Tier-1 Sparkle would publish an ASPA with the reservedAS0upstream declaration, letting AS52320 reject routes received from its customer AS8048 containing6762in the path. Complementary: RFC 9234 OTC (BGP roles + Only-To-Customer attribute enforced per session — needs vendor implementation; ASK YOUR ROUTING VENDORS) and Peerlock/ Peerlock-lite (simpler operator-side sanity checks — reject customer-learned routes containing a Tier-1 ASN in the path). Update 2026-02-03: Sparkle completed RPKI ROV deployment, marked safe on isbgpsafeyet.com (orthogonal to this incident, net win for the Internet). Canonical wiki instance of patterns/alternative-explanation-forensics — Cloudflare's inverse shape to the 2025-08-04 Perplexity public-attribution post: publish a long-form alternative to deflate a malicious claim about a named operator (backed by forensic evidence + proposed mechanism) rather than make one. Same posture: treat public attribution as a published good. Introduces: systems/cantv-as8048, systems/sparkle-as6762, systems/vtal-globenet-as52320, systems/dayco-telecom-as21980, systems/aspa, systems/peerlock; concepts/bgp, concepts/autonomous-system, concepts/route-leak, concepts/valley-free-routing, concepts/customer-provider-relationship, concepts/peer-peer-relationship, concepts/bgp-export-policy, concepts/as-path-prepending, concepts/bgp-community, concepts/rpki, concepts/route-origin-validation, concepts/path-validation, concepts/only-to-customer, concepts/irr-prefix-list, concepts/hairpin-route-leak; patterns/alternative-explanation-forensics, patterns/irr-prefix-filtering-export-policy, patterns/customer-community-tag-export-policy, patterns/as-path-prepending-for-traffic-engineering, patterns/aspa-for-path-validation, patterns/peerlock-sanity-check. Extends: systems/cloudflare-radar (route-leak anomaly pages + AS-profile connectivity as forensic data source); systems/bgpkit-monocle (the canonical AS-relationship-inference workflow). New shape added to Cloudflare: public alternative-explanation forensics — sibling of the 2025-08-04 Perplexity public-attribution shape but in the deflate direction rather than the make direction. Caveats: CANTV config is not public; Cloudflare's hygiene hypothesis is plausible but unconfirmed; monocle'sconnected: 9.9%is low (direction confidence carries the claim, not raw count); heavy prepending is evidence-against-MITM but not proof; RFC 9234 adoption depends on routing vendor roadmaps. 400 HN points (2026-01-08) -
sources/2025-11-18-cloudflare-outage-on-november-18-2025 — 2025-11-18 Cloudflare outage on November 18, 2025 — worst outage since 2019. Post-mortem on a ~3-hour core- traffic outage (11:20 → 14:30 UTC; full tail to 17:06 UTC) that made the front page of every major newspaper. Not an attack. A defensive, correct ClickHouse permission migration (explicit
r0.*grants so distributed subqueries run under the initiating user's account, not a shared system account) silently changed the row count returned by a metadata query the Bot Management feature-file generator relied on — because the query filtered ontablebut notdatabase. Post-grant, the query returned rows from bothdefaultandr0, doubling the feature count. The feature file is regenerated every 5 minutes and propagated fleet-wide — an instance of the global-configuration-push antipattern with no progressive rollout. The FL2 bots module had a 200-feature preallocated cap (well above the ~60 used — preallocation for hot-path performance; see concepts/feature-file-size-limit). The doubled file breached the cap; a Rust.unwrap()on the bounds check panicked everyfl2_worker_thread(concepts/unhandled-rust-panic); HTTP 5xx for every request hitting the bots module. FL1 (Lua) did not panic but returned bot score = 0 for every request → silent overblock for customers using bot-score rules. The ClickHouse permission rollout was gradual across nodes, so the feature file oscillated good/bad on a ~5-minute cadence — "very unusual behavior for an internal error" — mimicking a hyperscale DDoS. Cloudflare's status page (hosted off Cloudflare) went down coincidentally at the same time, deepening the attack suspicion; the team spent ~40 minutes on the DDoS hypothesis (concepts/intermittent-failure-signal-confusion). Dashboard impact in two windows (11:30–13:10 via Workers KV + Turnstile; 14:40–15:30 login-retry backlog). Stated remediations: (1) Hardening ingestion of Cloudflare-generated configuration files in the same way we would for user-generated input — the canonical wiki instance of concepts/internally-generated-untrusted-input; (2) Enabling more global kill switches for features; (3) bounding debugging / observability CPU amplification; (4) reviewing failure modes across all core proxy modules (anticipates 12-05's fail-open error handling project). The [[sources/2025-12-05-cloudflare- outage-on-december-5-2025|12-05 incident 17 days later]] names all four projects as still-incomplete. Introduces systems/clickhouse, systems/turnstile, systems/cloudflare-dashboard; concepts/feature-file-size-limit, concepts/preallocated-memory-budget, concepts/clickhouse-distributed-query, concepts/database-permission-migration-risk, concepts/unhandled-rust-panic, concepts/intermittent-failure-signal-confusion, concepts/global-configuration-system, concepts/fail-open-vs-fail-closed, concepts/internally-generated-untrusted-input, concepts/cdn-as-critical-internet-infrastructure, concepts/nil-index-lua-bug; patterns/harden-ingestion-of-internal-config, patterns/global-feature-killswitch, patterns/global-configuration-push, patterns/rust-replacement-of-dynamic-language-hot-path. Extends systems/cloudflare-bot-management (feature- file generator + 200-feature preallocation cap + regeneration every 5 min), systems/cloudflare-fl2-proxy (limit-of-the- Rust-type-system-defense section), systems/cloudflare-fl1-proxy (silent-bot-score-0 failure shape), systems/cloudflare-kv (impact + 13:05 core-proxy bypass), systems/cloudflare-access (widespread new-auth failures; existing sessions unaffected). Pairs with sources/2025-07-16-cloudflare-1111-incident-on-july-14-2025 (sibling missing-progressive-rollout RCA on a different surface — addressing) and sources/2025-12-05-cloudflare-outage-on-december-5-2025 (second three-week follow-on incident on the same config delivery channel). 1,465 HN points. Tier 1, 2025-11-18. -
sources/2025-12-05-cloudflare-outage-on-december-5-2025 — 2025-12-05 Cloudflare outage on December 5, 2025. Post-mortem on a 25-minute HTTP-500 outage affecting ~28% of Cloudflare HTTP traffic — the second self-inflicted global incident in three weeks (after 2025-11-18, structurally similar: single change propagating fleet-wide). Root cause was a seven-year-old dormant Lua nil-index bug (concepts/nil-index-lua-bug) in the FL1 proxy's rulesets engine — specifically the killswitch subsystem's post-processing path when a killswitch is applied to a rule with
action=execute. The killswitch correctly skipped the rule; the post-processing then unconditionally dereferencedrule_result.execute.results, which no longer existed because the rule had been skipped. Detonation sequence was benign- first-change + dangerous-second-change: change 1 (raise WAF body buffer 128 KB → 1 MB for CVE-2025-55182 React Server Components) was going out via gradual deploy; change 2 (disable the internal WAF testing tool that couldn't support the new buffer) went through the global configuration system — seconds-to-fleet-wide, no canary. Customers on FL2 (Rust) were unaffected; Cloudflare's explicit attribution — "This type of code error is prevented by languages with strong type systems. In our replacement for this code in our new FL2 proxy, which is written in Rust, the error did not occur." — makes this the canonical wiki instance of patterns/rust-replacement-of-dynamic-language-hot-path: the bug class is structurally eliminated by Rust'sOption<T>+ exhaustive matching without the rewrite needing to identify the specific bug. The affected intersection was FL1 + Cloudflare Managed Ruleset; customers without that configuration, plus the entire China network, were unaffected. Stated remediation names three still-incomplete projects from the 11-18 incident: (1) Enhanced Rollouts & Versioning for config data (apply code-deploy-style progressive rollout + health gating + rollback to threat-response / general config — the patterns/progressive-configuration-rollout discipline), (2) Streamlined break-glass for internal + control-plane operations under degraded-infra failure modes, and (3) "Fail-Open" Error Handling across critical data-plane components (concepts/fail-open-vs-fail-closed) — replace crash-on-error with log + default-to-known-good + pass- traffic-without-scoring. Cloudflare also named an unusual operational stance: all network changes locked down until mitigation/rollback systems are in place. Pairs with the 2025-07-14 1.1.1.1 outage post as another Cloudflare post- mortem that names the missing discipline (progressive config rollout, fail-open, break-glass) rather than just the specific bug, with a legacy-surface-carries-the-hazard shape (patterns/dual-system-sync-during-migration): the strategic surface (FL2) had moved past the class of bug that still lives on the legacy surface (FL1). Canonical wiki instances of concepts/nil-index-lua-bug, patterns/rust-replacement-of-dynamic-language-hot-path, concepts/fail-open-vs-fail-closed, concepts/global-configuration-system, and patterns/global-configuration-push as an anti-pattern. -
sources/2025-10-08-cloudflare-we-found-a-bug-in-gos-arm64-compiler — 2025-10-08 How we found a bug in Go's arm64 compiler. Weeks-long retrospective on a one-instruction race condition in Go's arm64 code generator, surfaced only by Cloudflare's 84 M HTTP req/s scale across 330 cities (~30 daily fatal panics across <10 % of data centers on arm64). Goroutines interrupted by async preemption between two opcodes of the function epilogue's split-
ADDstack-pointer adjustment (forced by arm64's 12-bitADDimmediate on frames >1<<12bytes) crashed the stack unwinder on the next GC cycle — eitherfatal error: traceback did not unwind completelyor SIGSEGV atm.incgo+0x118. Smoking gun was a production coredump loaded indlv: PC paused mid-ADDinside(*NetlinkSocket).Receive's epilogue. Isolated via a ~35-line stdlib-only minimal reproducer (big stack buffer + tightruntime.GC()sibling goroutine) that crashes go1.23.4 / arm64 in ~90 s. Fixed upstream in go1.23.12 / go1.24.6 / go1.25.0 via preemption- safe compiler emit: scratch-register + indivisible register-formADD R27, RSP, RSP. Canonical wiki pairing with concepts/compiler-generated-race-condition -
concepts/split-instruction-race-window + concepts/immediate-encoding-limit. Red herring: every crash had
(*NetlinkSocket).Receiveon the stack — that function happened to have a large-enough frame to trigger the split-ADDepilogue, not a library bug. Fifth wiki instance of patterns/upstream-the-fix (after 2025-10-14 V8/Node.js/OpenNext, 2026-02-27 Vercel/Web Streams, 2026-02-18 Datadog Go binary size, 2025-02-26 Fly.io / rustls Waker). -
sources/2026-04-21-cloudflare-moving-past-bots-vs-humans — 2026-04-21 Moving past bots vs. humans. Policy / architectural-framing post arguing that the "bots vs. humans" frame is no longer load-bearing — the important distinction is intent and behavior, not species. Introduces the rate-limit trilemma (decentralized, anonymous, accountable — pick two) as the governing constraint and positions Privacy Pass (RFC 9576/9578) + ARC + ACT as the exit route — anonymous credentials with protocol-level unlinkability. Cloudflare's Privacy Pass deployment already runs at billions of tokens per day, primarily via iCloud Private Relay. Splits web-protection into two client populations: the identifiable-infrastructure arm served by Web Bot Auth (clients that tolerate attribution) and the distributed-low-volume arm served by anonymous-credential successors. The open-Web guardrail — "anyone should be able to build their own device / browser / OS and access the Web" — is the non-regression test any proposed scheme must pass. Introduces the RFC 9576 four-role decomposition and the open issuer ecosystem governance pattern (with the Web PKI as the load-bearing server-side precedent of "anonymous + accountable"). Tier 1, 2026-04-21.
-
sources/2026-04-20-cloudflare-orchestrating-ai-code-review-at-scale — 2026-04-20 AI Code Review at scale. Deep-dive on Cloudflare's CI-native AI code-review orchestration system built on OpenCode. Up to seven specialised sub-reviewers (security, performance, code quality, documentation, release, AGENTS.md, engineering- codex) coordinated by a judge-pass coordinator agent; plugin architecture; circuit-breaker + failback chains; KV-backed remote model routing;
break glassescape hatch; incremental re-review with thread-state awareness. Production numbers (first 30 days): 131,246 review runs across 48,095 MRs in 5,169 repos; median review 3m 39s; median cost $0.98 / P99 $4.45; ~120 B tokens processed with 85.7% prompt-cache hit rate; 159,103 findings at ~1.2 per review (deliberately low); break-glass invoked 0.6% of MRs; 45+ upstream OpenCode PRs contributed back. Part of Code Orange: Fail Small. Canonical wiki instance of patterns/coordinator-sub-reviewer-orchestration, patterns/specialized-reviewer-agents, patterns/ai-review-risk-tiering, patterns/remote-config-model-routing, patterns/jsonl-streaming-child-process, patterns/incremental-ai-rereview + concepts/what-not-to-flag-prompt, concepts/break-glass-escape-hatch, concepts/ai-thinking-heartbeat, concepts/prompt-boundary-sanitization, concepts/ai-rereview-incremental, concepts/diff-noise-filtering, concepts/shared-context-fan-out, concepts/risk-tier-assessment, concepts/jsonl-output-streaming. -
sources/2026-04-17-cloudflare-unweight-how-we-compressed-an-llm-22-percent-without-sacrificing-quality — 2026-04-17 Unweight launch + technical paper + open-source GPU kernels. Cloudflare's lossless LLM weight compression system for NVIDIA H100: 15-22 % model-size reduction while preserving bit-exact outputs, no special hardware. Core technique: Huffman coding on the redundant BF16 exponent byte (out of 256 possible exponent values, ~16 cover >99 % of weights in a typical layer → info-theoretic floor ~2.6 bits vs 8 allocated → ~30 % exponent compression). Row-level verbatim escape for rare exponents (rows of 64; one decision per row, zero per-element branches in the GPU hot path). Applied selectively to MLP weights (gate / up / down, ~⅔ of parameters dominating decode memory traffic); attention, embeddings, layer norms uncompressed. Core kernel breakthrough: reconstruct weights in on-chip shared memory and feed the tensor cores directly — the reconstructed weights never touch HBM (concepts/fused-decompression-matmul / patterns/fused-decompress-tensor-core-matmul). Kernel splits thread groups into producer (TMA-driven HBM→SMEM transfer into a circular buffer) and consumer (reconstruct BF16 from exponent + sign + mantissa, feed Hopper WGMMA) roles — canonical patterns/sm-partitioning-producer-consumer instance. Four execution pipelines trading preprocess cost for matmul work — full decode (preprocess → HBM → stock cuBLAS matmul), exponent-only decode, palette transcode, direct palette (no preprocess, matmul reconstructs in SMEM from pre-baked 4-bit indices). Runtime autotuner sweeps per
(weight matrix, batch size)against measured end-to-end throughput and produces a per-model config file — canonical patterns/autotuned-execution-pipeline-selection instance. SM-level shared-memory budget collision: each Hopper SM has 228 KB SMEM; reconstructive matmul needs ~227 KB, Huffman decode kernel needs ~16 KB —227 + 16 > 228forces the two onto separate SMs; the split itself is autotuned. Hides cost via easy/hard layer alternation — while the GPU computes an "easy" (pre-transcoded) layer, a separate CUDA stream decodes the next "hard" layer's weights in the background, with double-buffered preprocess slots preventing overwrite. One compression format, two uses: the same Huffman bundle serves distribution (max compression ~22 %) and inference (palette-transcoded at load for efficient runtime). Honest cost framing — "not a free lunch": current throughput overhead 30-40 % end-to-end on H100 SXM5 (~41 % at batch 1, narrowing to ~30 % at batch 1024); three mitigations under active optimization (small-batch fixed costs, redundant weight-tile reconstruction, down projection still uncompressed). Llama-3.1-8B extrapolated to Llama-70B: ~18-28 GB VRAM saved. Positioning vs prior work: ZipNN (distribution+storage only, CPU decode); Huff-LLM (custom FPGA hardware); ZipServ (consumer-grade GPUs, not H100) — none deliver lossless inference-time decompression on Hopper datacenter GPUs integrable with a Rust inference engine. Roadmap: compress down projection (~⅓ more compressible weights); kernel optimization to narrow throughput gap; extend to larger Workers AI models; apply to MoE cold-expert fetch. Complements Infire's activation-memory discipline — Unweight attacks weights, Infire attacks activations, savings additive into KV-cache headroom. Introduces (5): systems/unweight, systems/unweight-kernels; concepts/huffman-coding, concepts/bf16-exponent-redundancy, concepts/memory-bandwidth-bound-inference, concepts/fused-decompression-matmul, concepts/hbm-vs-smem, concepts/lossless-weight-compression; patterns/fused-decompress-tensor-core-matmul, patterns/autotuned-execution-pipeline-selection, patterns/sm-partitioning-producer-consumer. Extends: systems/workers-ai (weight-compression lever section), systems/infire (complementary weight-side VRAM lever), systems/nvidia-tensor-core (canonical "tensor cores ~600× faster than HBM" bandwidth-bound framing + Hopper WGMMA + TMA - 228 KB SMEM budget), concepts/quantization (lossless sibling framing + contrast against hardware-native MXFP), concepts/kv-cache (Unweight VRAM savings translate directly into KV-cache headroom), patterns/upstream-the-fix (open-source kernels
-
technical paper as a contribution shape). Caveats stated in ingest: 30-40 % throughput overhead is current optimization level not final; MLP-only so 22 % cap is fraction-of-compressible not full-model (no attention / embedding / layer-norm compression); H100-only kernels today (Blackwell
sm_100/sm_120port is future work); Llama- 3.1-8B only measured (SwiGLU generalisation projected not tested); vs-prior-work comparison qualitative not benchmarked; kernels not yet upstreamed to vLLM / TGI / SGLang (Infire integration only disclosed); throughput-vs-memory trade-off makes this a capacity lever (more models per GPU), not a per-request latency win. (Tier 1, 2026-04-17) -
sources/2026-04-17-cloudflare-shared-dictionaries-compression-that-keeps-up-with-the-agent — 2026-04-17 Shared Dictionaries open-beta announcement (2026-04-30 beta). Edge implementation of RFC 9842 Compression Dictionary Transport — browser sends
Available-Dictionary: <hash>→ server compresses the new version against a client-cached previous version usingContent-Encoding: dcb(delta-Brotli) ordcz(delta-Zstandard) → only the diff goes on the wire. Framed as the compression response to the 2026 agentic-web collision of heavier pages (+6-9 % /yr) × more agentic clients (~10 % of Cloudflare requests in March 2026, +60 % YoY — concepts/agentic-traffic-share) × higher deploy frequency (AI-assisted coding compresses the deploy interval, bundler re-chunking invalidates filenames on every push) — three trends compounding on bytes-per-wire. Lab test on a 272 KB JS bundle: gzip → 92.1 KB (66 %), DCZ against v(n-1) → 2.6 KB (97 % over gzip); TTFB penalty ~20 ms; download 81-89 % faster. canicompress.com live demo: 94 KB SPA re-deployed every minute → diff on wire ~159 bytes (99.5 % over gzip). Three-phase rollout — Phase 1 passthrough (active development, open beta 2026-04-30; CF forwards headers + encodings, extends cache keys to vary onAvailable-Dictionary+Accept-Encoding; origin still runs lifecycle, e.g. via Patrick Meenan's dictionary-worker — RFC 9842 author's WASM-Zstandard reference Worker), Phase 2 managed dictionaries (customer names which assets are dictionaries; CF stores bytes + compresses + serves variants; origin serves ordinary responses), Phase 3 automatic dictionaries (no customer config; CF detects "successive responses share most content but differ by hash" URL patterns, auto-stores previous version as dictionary, auto- delta-compresses, uses RUM beacon to validate compression lift before serving). Includes cautionary precedent of SDCH (2008-2017) — Google shipped in Chrome with double-digit page-load improvements for early adopters, accumulated CRIME/BREACH compression side-channel attacks + cross-origin SOP violations + CORS irreconcilability + Cache-API spec gaps, un-shipped 2017. RFC 9842 closes the gaps with same-origin dictionary scope. Browser support: Chrome 130+, Edge 130+ ship; Firefox tracking. Argument for moving implementation to the CDN: "a CDN already sits in front of every request, already manages compression, and already handles cache variants" — the cache-variant- explosion cost (responses vary on encoding × dictionary hash, multiple variants per URL mid-deploy, hit rates drop, storage climbs) is what makes origin-side RFC 9842 implementation expensive enough to justify the edge move. Canonical instance of edge-managed protocol-complexity + passthrough→managed→auto CDN rollout + previous- version-as-dictionary patterns. "Compression with a memory." (Tier 1, 2026-04-17) -
sources/2026-04-17-cloudflare-redirects-for-ai-training-enforces-canonical-content — 2026-04-17 Redirects for AI Training launch. Dedicated launch post for the feature previously mentioned only as one dogfood mechanism inside the Agent Readiness Score post. Feature turns origin
<link rel="canonical">tags (RFC 6596, on 65-69 % of web pages per the 2025 Web Almanac) intoHTTP 301 Moved Permanentlyredirects only for verified AI training crawlers (GPTBot, ClaudeBot, Bytespider — the AI Crawler verified-bot category, distinct from AI Assistant and AI Search). Mechanism: (1)cf.verified_bot_category == AI Crawler, (2) Cloudflare reads origin HTML for<link rel="canonical">, (3) if present, non-self-referencing, and same-origin, returnHTTP 301+Location: <canonical>before serving. Humans + AI Assistant + AI Search unaffected. Empirical dogfood on developers.cloudflare.com: 4.8 M AI-Crawler visits / 30 days; 46,000 legacy-Workers-docs hits / March 2026 by OpenAI alone (3,600 Anthropic, 1,700 Meta); advisory stack (deprecation banner +noindex+ canonical tags) had no measurable effect; after feature activation 100 % of AI-training-crawler requests to non-self-referencing-canonical pages were redirected in the first 7 days. Ground-truth inference failure: a leading AI assistant returned the deprecatedwrangler kv:key putsyntax (deprecated in Wrangler 3.60.0; current syntaxwrangler kv key put) — exact failure mode the feature targets. Why not single redirect rules? Per-path rules don't scale, user-agents drift, rule-quota competes with campaign URLs + domain migrations, and redirect-rule files fall out of sync as content changes. Canonical tags are the declarative source the origin already maintains for SEO. Introduces AI Crawl Control (parent product — verified-bot telemetry + category-aware soft policy), canonical tag (RFC 6596 HTML primitive — 65-69 % web coverage),noindexmeta tag (adjacent advisory primitive shown insufficient for AI training ingest), canonical-tag- as-crawler-redirect pattern (declarative variant of the training-crawler redirect — origin's existing SEO infrastructure becomes agent-training-crawler-steering infrastructure), [[patterns/response-status-as-content- policy|response-status-as-content-policy pattern]] (broader framing —200default,301canonical-routing,402pay-per-crawl,403block,404gone; "status codes are ultimately how the web communicates policy to crawlers"). Extends the previously-ingested Redirects for AI Training page from secondary-mention to canonical-primary-source with mechanism, three-category verified-bot distinction, 7-day dogfood result, and alternative-analysis; concept + pattern with declarative-canonical-tag variant; Radar with the Response Status Code Analysis AI-Insights surface (aggregate ~74 / 13.7 / 11.3 / 1.2 %; GPTBot ~83 / 10 / 5.1 / 2.2 % across2xx/4xx/3xx/5xx), filterable by industry and crawl-purpose, API-exposed via Radar; docs site with the specific Wrangler-KV-commands deprecation case study; Wrangler with thekv:key put→kv key putWrangler-3.60.0 deprecation. Caveats stated in ingest: hypothesised (not guaranteed) downstream AI-answer-quality lift given training-pipeline-recrawl-cadence variability; doesn't fix already-ingested training data; unverified crawlers bypass; binary per-zone toggle (path-specific control needs Configuration Rules / Cloudflare for SaaS); no standard transparency-disclosure layer for which URLs were rewritten for which crawler class; 31-35 % of pages without canonical tags are no-ops for the feature; self-referencing canonicals are common CMS defaults and don't trigger the redirect. -
sources/2026-04-17-cloudflare-introducing-the-agent-readiness-score-is-your-site-agent-ready — 2026-04-17 Agent Readiness Score launch. Cloudflare introduces isitagentready.com, a Lighthouse-style four-dimension scorecard for sites cooperating with AI agents — Agent Discovery / Content for LLMs / Access Rules / Agent Actions — plus a non-scoring Agentic Commerce layer checking x402 / UCP / ACP. Introduces concepts/agent-readiness-score + a cluster of six
/.well-known/URIs graded together (API Catalog (RFC 9727), MCP Server Card, Agent Skills index, OAuth Protected Resource Metadata RFC 9728, Web Bot Auth signatures directory,robots.txt+ Content Signals). Web-wide measurement on Radar of 200 k top-visited domains:robots.txt78 % / Content Signals 4 % / markdown content negotiation 3.9 % / MCP Server Cards + API Catalog combined < 15 sites in the entire dataset. Same check embedded in URL Scanner as an Agent Readiness tab +{"agentReadiness": true}API option. Dogfood at the documentation layer: developers.cloudflare.com refined — splitllms.txtper top-level product directory (patterns/split-llms-txt-per-subdirectory), dynamic/index.mdvia two Transform Rules (URL Rewrite + Request Header Transform), hidden agent directives in HTML (patterns/hidden-agent-directive-in-html), Redirects for AI Training on deprecated docs (Wrangler v1, Workers Sites). Kimi-k2.5/OpenCode benchmark vs. average non-refined technical docs: 31 % fewer tokens + 66 % faster to correct answer (patterns/comparative-documentation-benchmark). Self-referential: isitagentready.com publishes its own MCP server at/.well-known/mcp.json(tool:scan_siteover Streamable HTTP) + Agent Skills index for agents to remediate failed checks programmatically. Score-driven standard-adoption canonical wiki instance — new recurring shape added to Cloudflare: make adoption of emerging standards measurable and competitive by shipping a Lighthouse-for-agents scorecard. Introduces systems/isitagentready, systems/cloudflare-url-scanner, systems/cloudflare-developer-documentation, systems/agent-skills, systems/redirects-for-ai-training, systems/mcp-server-card, systems/api-catalog-rfc-9727, systems/cloudflare-transform-rules, systems/cloudflare-access; concepts/agent-readiness-score, concepts/llms-txt, concepts/markdown-content-negotiation, concepts/robots-txt, concepts/sitemap, concepts/content-signals, concepts/api-catalog, concepts/agent-skills-discovery, concepts/mcp-server-card, concepts/oauth-protected-resource-metadata, concepts/link-response-header, concepts/hidden-agent-directive, concepts/agent-training-crawler-redirect, concepts/grep-loop, concepts/well-known-uri; patterns/well-known-endpoint-discovery, patterns/split-llms-txt-per-subdirectory, patterns/dynamic-index-md-fallback, patterns/hidden-agent-directive-in-html, patterns/comparative-documentation-benchmark, patterns/score-driven-standard-adoption, patterns/agent-training-crawler-redirect. Extends systems/model-context-protocol (MCP Server Card as pre-connect discovery primitive), systems/web-bot-auth (Access Rules scored check), systems/pay-per-crawl (HTTP-402 primitive generalised into x402 / agentic commerce), systems/cloudflare-radar (new Adoption of AI agent standards dataset), concepts/http-402-payment-required (x402 standardisation), concepts/machine-readable-documentation (public-facing instance vs. AWS 2026-03-26 internal-monorepo instance). Caveats: score weights not published;llms.txtis opt-in not default; Agentic Commerce deliberately non-scoring; 31 % / 66 % benchmark lacks absolute numbers -
named competitor baselines; MCP Server Card + Agent Skills + OAuth RFC 9728 are all draft / emerging standards (adoption numbers reflect this); redirects-for- AI-training is a unilateral origin decision with no standard transparency disclosure. (Tier 1, 2026-04-17)
-
sources/2026-04-17-cloudflare-agents-week-network-performance-update — 2026-04-17 Agents Week network performance update. Cloudflare reports moving from fastest in 40 % of the top 1,000 networks (Sept 2025) to fastest in 60 % (Dec 2025) — +40 countries, +261 networks, +54 US ASNs, and an average 6 ms gap to the next-fastest provider in December. Full methodology: APNIC-top-1,000-by-population denominator, RUM probe on Cloudflare-branded error pages, parallel fetches from Cloudflare, Amazon CloudFront, Google, Fastly, Akamai, trimean of connection time per (provider, network). Two orthogonal improvement axes named — (1) new PoPs (Wroclaw 19 → 12 ms RTT free-tier / Malang 39 → 37 ms RTT Enterprise / Constantine), (2) software efficiency on the connection-handling hot path (HTTP/3, congestion-window management, CPU/memory wins in SSL/TLS termination, traffic management, core proxy — the Pingora / FL2 proxy surface). Toll-booth analogy frames capacity/ distance vs. throughput-per-booth cleanly. Canonical wiki instance of both patterns/comparative-rum-benchmarking and patterns/pop-densification; introduces concepts/connection-time, concepts/trimean-aggregation, concepts/real-user-measurement, concepts/http-3, concepts/congestion-window, and concepts/point-of-presence as wiki concepts. Radar's Internet Quality surface is the data substrate (see systems/cloudflare-radar). Caveats: no absolute- latency publication beyond the PoP anecdotes, error-page RUM cohort bias, trimean discards tails.
-
sources/2026-04-16-cloudflare-deploy-postgres-and-mysql-databases-with-planetscale-workers — 2026-04-16 PlanetScale partnership extension post. Dashboard-driven provisioning of PlanetScale Postgres / MySQL databases directly from the Cloudflare dashboard / API, plus a "next-month" rollout of Cloudflare-billed PlanetScale usage (startup-programme + committed-spend credits redeemable against PlanetScale invoices). Connectivity runs through Hyperdrive's Workers binding (
hyperdrive: [{ binding: "DATABASE", id }]), so a standardpgclient works verbatim viaenv.DATABASE.connectionString. Introduces the explicit placement hint ("placement": { "region": "aws:us-east-1" }) as the knob that co-locates a Worker with its central edge-to-origin-DB latency dependency — forward-looking commitment to auto-placement once Cloudflare derives the hint from the PlanetScale DB's location ("single digit milliseconds" target). Canonical wiki instance of patterns/partner-managed-service-as-native-binding; storage-tier sibling of the 2026-04-16 AI Platform unification post's inference-tier instance of patterns/unified-billing-across-providers. Pass-through pricing: "a single node on PlanetScale Postgres starts at $5/month"; full PlanetScale feature surface preserved (query insights, AI-agent tooling, branching). Introduces systems/hyperdrive, systems/planetscale, systems/vitess; concepts/edge-to-origin-database-latency; patterns/explicit-placement-hint, patterns/partner-managed-service-as-native-binding. Extends systems/cloudflare-workers (newhyperdrive+placementconfig knobs), systems/postgresql + systems/mysql (PlanetScale-as-Cloudflare-native seen-in), patterns/unified-billing-across-providers (storage-tier instance). Caveats: narrow architectural substance — no latency / pool-size / cache-hit-rate numbers disclosed; Cloudflare-billed + auto-placement flows both forward-looking; no failure-mode discussion; MySQL side named as co-equal but all code snippets are Postgres. (Tier 1, 2026-04-16) -
sources/2026-04-16-cloudflare-ai-platform-an-inference-layer-designed-for-agents — 2026-04-16 Agents-Week AI Platform unification post. Same
env.AI.run()binding previously scoped to Workers AI@cf/…models now calls any of 70+ models across 12+ providers (Anthropic, OpenAI, Google, Alibaba Cloud, AssemblyAI, Bytedance, InWorld, MiniMax, Pixverse, Recraft, Runway, Vidu, …) with a one-line provider swap — canonical instance of patterns/unified-inference-binding and concepts/unified-model-catalog. REST API for non-Workers callers committed for the coming weeks. Multimodal catalog (image, video, speech, not just text LLMs). Automatic provider failover at the gateway when a model is available on multiple providers — no application-side retry logic (patterns/automatic-provider-failover). Buffered resumable streaming — AI Gateway buffers streams independently of the agent's lifetime, so a mid-turn agent crash resumes on the same stream without re-inference or double-billing (concepts/resilient-inference-stream, patterns/buffered-resumable-inference-stream); pairs with Agents SDK checkpointing for end-to-end "never notices" agent continuity. BYO-model via Replicate Cog containers — customer writescog.yaml+predict.py+cog build, pushes to Workers AI; current scope Enterprise + design-partner, roadmap includes customer- facing push APIs +wranglercommands + GPU-snapshotting- based faster cold starts (patterns/byo-model-via-container). Per-requestmetadata: {...}(team, user, workflow) feeds spend attribution through patterns/unified-billing-across-providers. Colo-with- inference latency —@cf/…models avoid public-Internet hops via Cloudflare's 330-city network, preserving the fastest-TTFT path for agent workloads. Strategic context: the Replicate team has officially joined the Cloudflare AI Platform team ("we don't even consider ourselves separate teams anymore"); Replicate models are being brought onto AI Gateway and hosted models replatformed onto Cloudflare infrastructure — this is what explains the catalog expansion from text-LLM-dominated to multimodal. Introduces: systems/replicate-cog; concepts/unified-model-catalog, concepts/resilient-inference-stream; patterns/unified-inference-binding, patterns/automatic-provider-failover, patterns/buffered-resumable-inference-stream, patterns/byo-model-via-container. Extends: systems/cloudflare-ai-gateway (catalog-wide unification -
reliability-tier primitives), systems/workers-ai (unified-catalog surface + BYO-model substrate), systems/cloudflare-workers (AI binding now routes to any provider), systems/cloudflare-agents-sdk (resumable- streaming beneficiary), patterns/ai-gateway-provider-abstraction (catalog-scale umbrella pattern extended with reliability tier), patterns/unified-billing-across-providers (per- request custom-metadata attribution), patterns/central-proxy-choke-point (gateway as the only vantage point for aggregate spend across 3.5-models-average organisations), concepts/centralized-ai-governance (reliability framed as a fourth pillar). Caveats: no production volume or latency numbers for the unified catalog; failover mechanism described not parameterised (health-check cadence, fallback ordering policy, cross-provider-schema-translation details not disclosed); stream-buffering retention window + memory footprint + cost semantics undisclosed; BYO-model pricing + GA timeline not disclosed; REST API forward-looking; no competitive comparison against OpenRouter / AWS Bedrock Converse / Azure AI Foundry / Vercel AI SDK. Sixth instance of the 2026-04 "dogfood the platform as a customer-facing product" shape after Agent Lee, Project Think, AI Search, Artifacts, and Agentic Inbox — but different sub-shape: the product here is the platform itself as a unified-inference-layer surface, not a product built atop the platform. (Tier 1, 2026-04-16)
-
sources/2026-04-16-cloudflare-email-service-public-beta-ready-for-agents — Agents-Week post moving Email Sending from private to public beta and framing email as the new first-class agent interface ("everyone already has an email address"). Pairs with long-standing free Email Routing for "complete bidirectional email within a single platform". Five integrations shipped: (1) native Workers
env.EMAIL.send(...)binding; (2) REST API + TS / Python / Go SDKs; (3)wrangler email sendCLI; (4) Email MCP server on the Cloudflare MCP surface (same Code Mode-powered ~3,000-operation surface Agent Lee uses); (5) cloudflare/skills Email Service skill drop-in for coding agents. SPF / DKIM / DMARC auto-configured on domain attach — deliverability plumbing absorbed into the platform (sibling shape to Universal SSL, patterns/default-on-security-upgrade family). Agents SDK email-native:onEmail(email)hook +this.sendEmail({…, inReplyTo, …})+routeAgentEmail(message, env, { resolver }) createAddressBasedEmailResolver("SupportAgent"); agent identity is encoded in the email address itself — local-part → class, RFC-5233 plus-sub → instance — sosupport@domain/support+ticket-123@domain/sales@domaineach route to a distinct DO instance without any mailbox provisioning (concepts/address-based-agent-routing, patterns/sub-addressed-agent-instance). Reply-routing headers are HMAC-SHA256-signed so replies route back to the exact originating DO instance — prevents forged-header re-routing attacks (patterns/signed-reply-routing-header; post: "a security concern that most 'email for agents' solutions haven't addressed"). The thesis that separates an agent from a chatbot: "A chatbot responds in the moment or not at all. An agent thinks, acts, and communicates on its own timeline. With Email Sending, your agent can receive a message, spend an hour processing data, check three other systems, and then reply with a complete answer" — the inbox substrate tolerates arbitrary reply latency the way a WebSocket chat pane doesn't (concepts/asynchronous-reply-email). Fourth substrate of agent memory: "the inbox becomes the agent's memory, without needing a separate database or vector store" — DO embedded state + email thread together (alongside DO-SQLite conversation trees for episodic, AI Search instances for semantic, and Artifacts repos for filesystem + session-history memory). Agentic Inbox reference app open-sourced (github.com/cloudflare/agentic-inbox) with Email Routing + Email Sending + Workers AI + R2 + Agents SDK stitched into an end-to-end inbound-classify-persist-reply pipeline (patterns/inbound-classify-persist-reply-pipeline); built-in MCP server for external-agent "draft for review" workflow — email-tier analog of Agent Lee's elicitation gate. Fifth instance of the 2026-04 "dogfood the platform as a customer-facing product" recurring shape (Agent Lee, Project Think, AI Search, Artifacts, now Agentic Inbox). Introduces systems/cloudflare-email-service, systems/agentic-inbox; concepts/email-as-agent-interface, concepts/address-based-agent-routing, concepts/asynchronous-reply-email; patterns/signed-reply-routing-header, patterns/sub-addressed-agent-instance, patterns/inbound-classify-persist-reply-pipeline. Extends systems/cloudflare-agents-sdk (new email-native section:onEmail+sendEmail+routeAgentEmail+createAddressBasedEmailResolver+ HMAC-signed reply headers), systems/cloudflare-workers (newEMAILbinding alongsideai/durable_objects/ai_search_namespaces), systems/cloudflare-durable-objects (fifth 2026-04 "one DO per caller-identified unit" launch), systems/workers-ai (inbound email classification role in Agentic Inbox), systems/cloudflare-r2 (fourth substrate role — email attachments in Agentic Inbox), systems/model-context-protocol (new Email MCP server on the Cloudflare MCP surface + Agentic Inbox's own built-in MCP server for draft-for-review), systems/wrangler-cli (newemail sendsubcommand), concepts/agent-memory (fourth substrate: email thread + DO state), concepts/one-to-one-agent-instance (email-channel-tier realisation). Caveats: public-beta — no SLA, no deliverability metrics, no private-beta-cohort production numbers; HMAC-signing wire format not published; pricing for Email Sending deferred to earlier Birthday Week post; no SES / SendGrid / Postmark competitive positioning; multi-region / GDPR data-residency posture not discussed. (Tier 1, 2026-04-16)- sources/2026-04-16-cloudflare-building-the-foundation-for-running-extra-large-language-models
— Deep dive on how Workers AI serves
extra-large LLMs like Kimi K2.5
(>1T params, ~560 GB weights, 8× H100 minimum for weights
alone). Five load-bearing pieces: (1)
prefill/decode disaggregation — separate servers for
compute-bound prefill vs memory-bound decode, with a custom
token-aware load balancer that rewrites SSE responses and
tracks in-flight tokens separately per stage pool
(concepts/token-aware-load-balancing); measured p90 TTFT
dropped, p90 intertoken latency ~100 ms → 20-30 ms = 3×
improvement, using the same quantity of GPUs while
request volume increased (concepts/time-to-first-token /
concepts/intertoken-latency / patterns/disaggregated-inference-stages).
(2) Client-signalled
x-session-affinityheader for cross-cluster warm-cache routing, incentivised by discounted cached-token pricing; heavy-internal-user rollout raised peak input-cache-hit ratio 60% → 80% — "a small difference in prompt caching from our users can sum to a factor of additional GPUs" (patterns/session-affinity-header). Integrated into agent harnesses via OpenCode PR #20744. (3) Cluster-wide shared KV cache over RDMA via Moonshot AI's Mooncake Transfer Engine (NVLink + NVMe-oF) + Mooncake Store (NVMe cold tier) + LMCache / SGLang HiCache software layer — eliminates within-cluster session-aware routing, "allows us to load balance the traffic much more evenly". (4) Speculative decoding with NVIDIA EAGLE-3 (nvidia/Kimi-K2.5-Thinking-Eagle3) — shines on agentic workloads because tool calls + JSON envelopes are structurally predictable: "you know there will be a name, description, and it's wrapped in a JSON envelope". (5) Infire — Cloudflare's proprietary Rust inference engine, now with multi-GPU tensor + pipeline - expert parallelism (concepts/multi-gpu-serving); lower activation-memory overhead than vLLM — fits Llama 4 Scout on 2× H200 with >56 GiB KV room (~1.2M tokens); fits Kimi K2.5 on 8× H100 (not H200) with >30 GiB KV room — "in both cases you would have trouble even booting vLLM in the first place"; sub-20s cold boot; +20% tokens/sec on unconstrained systems; enables running latest models on lower-end hardware. Post-launch retuning made Kimi K2.5 3× faster on the same hardware — configuration response to observed agentic traffic shape (large system prompt + tools + MCPs + growing context → input-heavy
-
long reusable prefixes → fast input-token processing + fast tool-call generation are the things to optimise). Introduces systems/infire, systems/mooncake-transfer-engine, systems/mooncake-store, systems/eagle-3, systems/lmcache; concepts/prefill-decode-disaggregation, concepts/token-aware-load-balancing, concepts/session-affinity-prompt-caching, concepts/rdma-kv-transfer, concepts/tensor-parallelism, concepts/pipeline-parallelism, concepts/expert-parallelism, concepts/multi-gpu-serving, concepts/time-to-first-token, concepts/intertoken-latency; patterns/disaggregated-inference-stages, patterns/session-affinity-header, patterns/kv-aware-routing. Extends systems/workers-ai (full serving-architecture section), systems/kimi-k2-5 (size + multi-GPU hardware footprint + post-launch 3× speed-up), concepts/kv-cache (cluster-wide KV sharing + PD-boundary artifact), concepts/speculative-decoding (agentic-workload why-it-shines + EAGLE-3 instance), systems/sglang (HiCache as cluster-wide cache layer), systems/vllm (reference baseline Infire measures against). Caveats: no absolute p90 TTFT number (graph only); no prefill:decode node ratio; no EAGLE-3 acceptance rate; no cached-token discount magnitude; "20% higher tok/s" lacks baseline specification; "3× faster" lacks workload specification; vLLM comparison is Cloudflare-sided. (Tier 1, 2026-04-16)
-
sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git — Launch + architecture post for Artifacts: a distributed versioned filesystem, built for agents, that speaks the Git wire protocol. Repositories created programmatically via REST or a Workers binding (
env.AGENT_REPOS.create(name)), then cloned / fetched / pushed with any ordinary Git client. Private beta at launch; public beta by early May 2026. Load-bearing design decisions: (1) pick Git specifically because it's "deep in the training data of most models" and sidesteps the bootstrap problem of any new protocol — canonical instance of patterns/git-protocol-as-api. (2) one Durable Object per repo — same agents-are-one-to-one economics applied to the storage tier (concepts/one-to-one-agent-instance, patterns/do-backed-git-server). (3) Git server is a ~100 KB pure-Zig Wasm module with zero deps beyond std — implements SHA-1, zlib, delta encode/decode, pack parsing, full smart-HTTP v1+v2 from scratch (concepts/wasm-git-server); Zig chosen for manual memory control in the ~128 MB DO budget + Zig Build System sharing code across Wasm + native-against-libgit2 tests + tight 11+1-function host interface. Storage stratifies DO SQLite (chunked across 2 MB rows) → R2 pack-file snapshots → KV auth tokens. Memory-aware streaming on fetch + push paths + delta-form stored alongside resolved object so Zig can emit a delta when the client has the base, saving bandwidth. Internal dogfood: Cloudflare persists per-session filesystem state + session history in per-session Artifacts repos — time-travel + fork + diff on arbitrary agent state (concepts/repo-per-agent-session)..import()+.fork()are first-class primitives; native git-notes for agent-written metadata (prompts, attribution, judge verdicts) without mutating objects. Sibling open-source ArtifactFS "git clone but async": blobless clone + background-hydration daemon with agent-aware priority (manifests → configs → code → binaries) for any Git remote; illustrative claim ~90–100 s off 2.4 GB-repo sandbox startup × 10 k jobs/month = ~2,778 sandbox hours saved (concepts/async-clone-hydration, patterns/blobless-clone-lazy-hydrate). Pricing: $0.15 per 1 000 operations (10 k/month free) + $0.50 /GB-month (1 GB free) — canonical wiki instance of concepts/scale-to-zero applied to versioned storage. Introduces systems/cloudflare-artifacts, systems/artifact-fs; concepts/agent-first-storage-primitive, concepts/wasm-git-server, concepts/async-clone-hydration, concepts/repo-per-agent-session; patterns/git-protocol-as-api, patterns/blobless-clone-lazy-hydrate, patterns/do-backed-git-server. Extends systems/git (new Seen-in — the canonical wiki instance of Git-protocol-as- API), systems/cloudflare-durable-objects (per-repo DO as the fourth 2026-04 "one DO per caller-identified unit" launch alongside Agent Lee / Project Think / AI Search), systems/cloudflare-r2 (pack-file snapshot store — third R2 substrate role after Tier-0 workspace + AI-Search-managed- storage), systems/cloudflare-kv (auth-token store), systems/cloudflare-workers (stateless Git-server front-end -
new Workers binding), concepts/git-pack-file (delta-alongside-resolved storage strategy), concepts/git-delta-compression (from-scratch Zig implementation), concepts/one-to-one-agent-instance (storage-tier realisation), concepts/actor-model (actor-per-repo), concepts/agent-memory (filesystem + session-history tier), concepts/scale-to-zero (versioned-storage-tier economics), companies/cloudflare (2 new Key systems + prepended Recent-article entry). Caveats: beta post — no production-scale numbers (no QPS, p99, fleet counts); 2.4 GB / 2 min and 2,778 sandbox-hour numbers are illustrative scaling, not measured production; SHA-256 / Git-LFS support timelines unclear; durability + replication model of DO↔R2↔KV stratification not detailed; pricing hedged ("should this pricing change"). (Tier 1, 2026-04-16)
-
sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents — Launch + architecture post for AI Search (formerly AutoRAG) as a plug-and-play managed search primitive for AI agents. Hybrid BM25 + vector retrieval (previously vector only), with the pipeline fully configurable at instance creation time:
index_method: { keyword: true, vector: true },keyword_tokenizer: "porter" | "trigram",keyword_match_mode: "and" | "or",fusion_method: "rrf" | "max",reranking: truewith@cf/baai/bge-reranker-base. Newai_search_namespacesruntime-provisioned binding:create()/delete()/list()/search()at namespace level lets an application spin up one instance per agent, per customer, or per language without redeployment — retrieval-tier realisation of one-to-one agent (patterns/runtime-provisioned-per-tenant-search-index). Each instance has built-in storage + index (R2 -
Vectorize) so
items.uploadAndPoll(name, content)is a single awaitable call (patterns/upload-then-poll-indexing, concepts/unified-storage-and-index). Metadata boost at query time viaai_search_options.boost_by: [{field: "timestamp", direction: "desc"}]layers business logic on top of relevance without re-indexing (concepts/metadata-boost, patterns/metadata-boost-at-query-time). Cross-instance search viainstance_ids: ["product-knowledge", "customer-abc123"]fans across multiple indexes, merges, and returns one ranked list (patterns/cross-index-unified-retrieval) — the namespace-level realisation of patterns/unified-retrieval-tool. Worked example: customer- support agent built on the Agents SDK (new Key system) with one sharedproduct- knowledgeinstance + one per-customer instance for resolution history — "After each resolved issue, the agent saves a summary of what went wrong and how it was fixed" — canonical wiki instance of agent memory as a per-tenant search index rather than a conversation-history transcript. LLM: Kimi K2.5 (introduces systems/kimi-k2-5) via Workers AI (introduces). Browser Run (rebrand of Browser Rendering) now built-in for website-as-data-source crawling, not billed separately during beta. Dogfood reveal: "The search on our blog is now powered by AI Search." Third instance of the 2026-04 "dogfood the platform as a customer-facing product" shape after Agent Lee and Project Think. Open-beta limits: 100 (Free) / 5,000 (Paid) instances/account; 100K / 1M files per instance (500K for hybrid); 4MB/file; 20K (Free) / unlimited (Paid) queries/month.ai_search_namespacesreplaces the previousenv.AI.autorag()binding; old API retained via Workers compatibility dates. No latency / throughput / recall numbers disclosed; embedding model inside the instance not named; chunking strategy opaque; no sparse-vector / SPLADE / ELSER story. Introduces systems/cloudflare-ai-search, systems/cloudflare-vectorize, systems/cloudflare-agents-sdk, systems/workers-ai, systems/kimi-k2-5; concepts/metadata-boost, concepts/per-tenant-search-instance, concepts/unified-storage-and-index, concepts/agent-memory; patterns/runtime-provisioned-per-tenant-search-index, patterns/cross-index-unified-retrieval, patterns/metadata-boost-at-query-time, patterns/upload-then-poll-indexing. Extends systems/cloudflare-r2 (AI-Search-instance storage substrate section), systems/cloudflare-durable-objects (canonical consumer ofai_search_namespacesviaAIChatAgent-on-DO), systems/cloudflare-workers (new binding in the bindings family), systems/cloudflare-ai-gateway (still-separately- billed inference-tier companion), systems/cloudflare-browser-rendering (Browser Run rebrand + built-in-crawler role), systems/bm25 (first-class tokenizer/match-mode knobs on a managed instance), concepts/hybrid-retrieval-bm25-vectors (Cloudflare-managed productisation with tokenizer-per-content-type axis), concepts/reciprocal-rank-fusion (asfusion_method: "rrf"default), concepts/cross-encoder-reranking (as instance-level config with@cf/baai/bge-reranker-base), patterns/native-hybrid-search-function (second major productisation alongside Atlas Hybrid Search; first platform- first vendor shape vs MongoDB's DB-first), patterns/unified-retrieval-tool (namespace-level realisation at runtime). (Tier 1, 2026-04-16) -
sources/2026-04-15-cloudflare-project-think-building-the-next-generation-of-ai-agents — Same-day companion to the Agent Lee launch; announces Project Think — the next generation of the Agents SDK. Six primitives: (1) durable execution via fibers (
runFiber()/stash()/onFiberRecovered, SDK-managedkeepAlive()during execution, hibernate-on-long-callback for hour-plus work) — see concepts/durable-execution, patterns/checkpoint-resumable-fiber; (2) sub-agents via Facets — child DOs colocated with the parent, each with isolated SQLite + typed RPC at function-call latency — see patterns/colocated-child-actor-rpc; (3) Session API — tree-structured conversation memory with forking, non-destructive compaction, FTS5 full-text search — see patterns/tree-structured-conversation-memory; (4) sandboxed code execution via Dynamic Workers +@cloudflare/codemode+ runtimenpmvia@cloudflare/worker-bundler; (5) the execution ladder — Tier 0 workspace (DO SQLite + R2) / Tier 1 Dynamic Worker / Tier 2 - npm / Tier 3 Browser
Rendering / Tier 4 Cloudflare
Sandbox — "useful at Tier 0 alone, each tier is additive"
(patterns/additive-capability-ladder); (6)
self-authored extensions — the agent writes its own TypeScript
tool with a permission manifest, bundles with
worker-bundler, loads into a Dynamic Worker, persists in DO storage, survives hibernation (concepts/self-authored-extension). Load-bearing framings: "agents are one-to-one" (concepts/one-to-one-agent-instance — 10K agents × 1% active = 10K always-on VMs vs ~100 active DOs); "capability model: no ambient authority, bindings grant capabilities explicitly" (concepts/capability-based-sandbox); "three waves: chatbots → coding agents → agents as infrastructure." Quantification: Code Mode MCP server exposes two tools covering ~3,000 operations in ~1,000 tokens vs ~1.17M tokens for the naive tool-per-endpoint equivalent = 99.9% reduction. TheThinkbase class wires the whole ladder + session API + fibers + extension loader behind overridable hooks (getModel,getTools,configureSession,beforeTurn,beforeToolCall,afterToolCall,onStepFinish,onChatResponse) and exposes context blocks — structured-system-prompt sections the model can read + update, with live token accounting ("MEMORY [42%, 462/1100 tokens]"). Preview; no production-scale numbers (contrast same-day Agent Lee launch which discloses 18K DAU / 250K tool calls / day). Introduces systems/project-think, systems/dynamic-workers; concepts/durable-execution, concepts/one-to-one-agent-instance, concepts/capability-based-sandbox, concepts/execution-ladder, concepts/self-authored-extension, concepts/actor-model; patterns/checkpoint-resumable-fiber, patterns/additive-capability-ladder, patterns/colocated-child-actor-rpc, patterns/tree-structured-conversation-memory. Extends systems/cloudflare-durable-objects (agent-actor-substrate section; the VM-vs-DO comparison table; three new primitives hosted on DO storage), systems/code-mode (99.9% quantification -
Dynamic Workers substrate), systems/model-context-protocol (quantified-context-window-win section), systems/cloudflare-r2 / systems/cloudflare-browser-rendering / systems/cloudflare-sandbox-sdk (Tier 0 / 3 / 4 role), patterns/code-generation-over-tool-calls (capability-sandbox prerequisite + 99.9% quantification). (Tier 1, 2026-04-15)
-
sources/2026-04-15-cloudflare-introducing-agent-lee — Launch + architecture post for Agent Lee, Cloudflare's first-party in-dashboard AI assistant; ~18,000 daily users / ~250,000 tool calls/day in beta across DNS, Workers, SSL/TLS, R2, Registrar, Cache, Cloudflare Tunnel, API Shield, and more. Three architectural reveals: (1) uses Code Mode to convert MCP tools into a TypeScript API and asks the model to write code against it (LLMs see far more TypeScript than tool-call examples → higher accuracy; multi-step tasks collapse into a single generated script, no per-step round-trips) — see patterns/code-generation-over-tool-calls; (2) generated code travels through a Durable Object acting as a credentialed proxy that classifies the code as read or write, proxies reads directly, and blocks writes until approved via an elicitation gate ("the permission model is the enforcement layer, and the confirmation prompt you see is not a UX courtesy. It's the gate."); API keys are held inside the DO and injected server-side — never present in generated code — see patterns/credentialed-proxy-sandbox; (3) responses are rich UI, not text — dynamic tables, interactive charts, architecture maps rendered on an adaptive grid the user can carve up with natural-language component requests — see patterns/dynamic-ui-generation. Built on Agents SDK + Workers AI + Durable Objects + MCP — same primitives shipped to customers. Introduces systems/agent-lee, systems/code-mode; concepts/elicitation-gate; patterns/code-generation-over-tool-calls, patterns/credentialed-proxy-sandbox, patterns/dynamic-ui-generation. Extends systems/model-context-protocol (new agent consumption layer with permission enforcement section), systems/cloudflare-durable-objects (credentialed-proxy boundary section). (Tier 1, 2026-04-15)
-
sources/2026-04-13-cloudflare-building-a-cli-for-all-of-cloudflare — Cloudflare announces a Technical Preview of the next-generation Wrangler CLI rebranded as
cf(install:npx cf/npm install -g cf) that aims to cover all ~100 Cloudflare products and ~3,000 HTTP API operations. Alongside the CLI, Cloudflare releases Local Explorer — an open-beta introspection GUI + a local mirror of the Cloudflare API at/cdn-cgi/explorer/apion any Wrangler or Vite-plugin-powered app, backed by Miniflare's local state for KV / R2 / D1 / Durable Objects / Workflows. Core architectural reveal: Cloudflare replaced OpenAPI with an in-house TypeScript-based schema as the source of truth for generating every interface surface — CLI, SDK, Terraform provider, Workers bindings, MCP Code Mode server (all 3,000 operations in <1,000 tokens), Agent Skills,wrangler.jsoncconfiguration, docs, and (as an output) OpenAPI itself. Posture is explicitly agents-first: "agents are now the primary customer of our APIs". Consistency rules (getnotinfo,--forcenot--skip-confirmations,--jsoneverywhere) are enforced at the schema layer, not at review time. Local / remote parity is baked into the CLI:--localis the only flag that differs. See concepts/unified-interface-schema, concepts/cli-convention-enforcement, concepts/local-remote-parity, concepts/agent-ergonomic-cli, patterns/typescript-as-codegen-source, patterns/schema-driven-interface-generation, patterns/local-mirror-of-remote-api. Sibling of the Cap'n Web TypeScript-as-schema move one layer down. -
sources/2026-04-07-cloudflare-targets-2029-for-full-post-quantum-security — Cloudflare publishes an accelerated 2029 target for full post- quantum security across the entire product suite, including post-quantum authentication — historically the secondary concern behind harvest- now-decrypt-later but now urgent because independent advances compress Q-Day toward 2029-2030. Three independent fronts converged in early 2026: Google's undisclosed speed-up to the quantum elliptic-curve-attack algorithm (disclosed via zero-knowledge proof only); Oratomic's resource estimate showing RSA-2048 / P-256 breakable on a 10,000-qubit neutral-atom computer; neutral-atom error-correction achieving 3-4 physical qubits per logical qubit vs ~1,000 on nearest-neighbor superconducting. Google independently moved to 2029; IBM Quantum Safe CTO no longer rules out moonshot attacks on high-value targets as early as
-
Scott Aaronson's "public estimates will now go dark" framing explicitly adopted. Core threat-model pivot: HNDL was the dominant concern at far-Q-Day (addressed by Cloudflare's 2022 PQ-KEM-for-all rollout — >65 % of human traffic to Cloudflare is now PQ-encrypted, per Radar); near-Q-Day inverts the priority to authentication because any quantum-forged long-lived key (root CA, code-signing cert, API auth key, federation trust anchor — see concepts/long-lived-key-risk) is an immediate RCE / MITM / impersonation vector. Add-PQ-is-not-enough: enabling PQ alongside classical doesn't defeat downgrade attacks because the transcript signature that normally detects tampering is itself forgeable under a CRQC; for closed systems, disable classical; for federated systems (the public web) use PQ HSTS and/or CT-based downgrade detection. Rotate-after-disable: every secret ever sent over classical sessions (passwords, API keys, access tokens) must be rotated because an attacker with prior captures recovers them post-Q-Day. Captured in patterns/disable-legacy-before-rotate. Third-party scope: migration planning must cover indirect dependencies (financial services, utilities) not only direct cryptographic counterparties, per patterns/third-party-dependency-quantum-assessment. Cloudflare milestones: Mid-2026 ML-DSA for Cloudflare→origin (controlled endpoints, direct deployment feasible); Mid-2027 Merkle Tree Certificates for visitor→Cloudflare (handles PQ-signature wire-size blow-up via batch-and-amortise); Early 2028 Cloudflare One SASE suite; 2029 entire product suite. Posture: default-on at no additional cost across all plans — consistent with 2014 Universal SSL and 2022 PQ-for-all. "Free TLS helped encrypt the web. Free post- quantum cryptography will help secure it for what comes next." Signature-side counterpart to GitHub's 2025 KEM-side SSH rollout at the threat-model level. 386 HN points (2026-04-07)
-
sources/2026-02-27-cloudflare-a-better-streams-api-is-possible-for-javascript — Cloudflare engineer James Snell (Node.js TSC member, multi-runtime Web streams implementer) publishes a structural critique of the WHATWG Web Streams Standard arguing its usability + performance issues cannot be fixed incrementally — they are consequences of 2014-2016 design choices made before
for await…oflanded in ES2018. Enumerates eight structural issues: excessive ceremony for reader+lock+controller acquisition; locks-as-footguns (forgottenreleaseLock()permanently breaks the stream); BYOB complexity without real-world payoff; backpressure that is advisory-only (controller.enqueue()always succeeds even whendesiredSizeis deeply negative); per-operation promise allocation (Vercel measured 12× gap betweenpipeThroughand Nodepipeline(); Cloudflare internal fix reduced promises by up to 200×); unconsumedfetch()bodies leaking connections (real undici bug);tee()'s unbounded memory-cliff (Workers diverges to a shared-buffer model);TransformStream's push semantics that leak backpressure signals upstream. The post ships a proof-of-concept alternative ( new-streams, github.com/jasnell/new-streams) built on six inverted foundations:AsyncIterable<Uint8Array[]>as the readable substrate, pull- based lazy evaluation, four explicit backpressure policies (strict / block / drop-oldest / drop-newest, strict default), batchedUint8Array[]chunks, structural writers (any{ write, end, abort }object), parallel synchronous fast paths (Stream.pullSync,Stream.textSync) that skip promise machinery entirely. Benchmarks: 2×-120× faster than Web streams in Node.js, Deno, Bun, Workers, and every major browser; chained-3×-transform result shows ~80-90× via pull-through semantics eliminating intermediate buffers. Explicit conversation-starter, not a ship-it proposal. Also opens a second instance of upstream- the-fix — Snell commits publicly to helping Vercel'sfast-webstreamsland in Node.js, benefiting Vercel + AWS Lambda + self- hosted Node (Workers has its own Web-streams implementation so gets no direct benefit). Companion / sequel to the 2025-10-14 Workers benchmarks post at the bug-level. 447 HN points (2026-02-27) - sources/2025-10-14-cloudflare-unpacking-cloudflare-workers-cpu-performance-benchmarks —
Cloudflare's 2025-10-14 public response to Theo Browne's
cf-vs-vercel-benchbenchmark (Oct 4, 2025) that had shown Workers up to 3.5× slower than Node.js on Vercel despite both embedding the same V8. Diagnosis + remediation spans four layers: (1) Workers runtime tuning — re-tuned warm-isolate routing so CPU-bound bursts spin up new isolates faster (preserving I/O- bound coalescing), and removed a stale 2017-era V8 young-gen cap → ~25 % benchmark win, small memory increase, global rollout, all Workers benefit; (2) OpenNext adapter code — 50 × 2048-byteBufferallocation perpipeThrough(), per- chunk output copy,Buffer.concat(chunks).lengthdiscard, Node ⇆ Web streams double-buffer (concepts/stream-adapter-overhead), value-orientedReadableStreamwith defaulthighWaterMark: 1— all submitted as upstream PRs; (3) V8 / Node.js upstream fixes (patterns/upstream-the-fix) — ~33 %JSON.parse(text, reviver)speedup patch to V8 (Chromium CL 7027411, ships V8 14.3 / Chrome 143) benefiting Node.js + Chrome + Deno - every other embedder, plus Node.js PR #60153 enabling the
faster trig compile flag (no benefit to Cloudflare
customers — they did it anyway); (4) benchmark
methodology (concepts/benchmark-methodology-bias) —
catalogue of six bias classes including TTFB/TTLB skew,
correlated noise (so more iterations don't help),
hardware-generation lottery, unset
NODE_ENVdefaulting React to dev mode, client-side network latency. Result: parity on every benchmark case except Next.js (OpenNext still has gap, stated plan to close). Canonical wiki instance of Workers / OpenNext / V8 upstream engineering. 317 HN points (2025-10-14) - sources/2025-09-22-cloudflare-capn-web-rpc-for-browsers-and-web-servers
— 2025-09-22 Cap'n Web launch + open-source + design
deep-dive by Kenton Varda, author of
Cap'n Proto. Announces
Cap'n Web, a pure-TypeScript
object-capability RPC protocol — spiritual sibling of
Cap'n Proto that keeps the
object-capability
model and promise
pipelining while dropping
.capnpschemas in favour of TypeScript-interface-as-schema + JSON wire format. <10 kB minify+gzip, zero dependencies, MIT. Three built-in transports (HTTP batch / WebSocket /postMessage()), runs in every major browser + Workers + Node.js. Four-message wire protocol (push/pull/resolve/reject) with predictable positive export-table IDs enabling pipelining before the first response lands; symmetric / [[concepts/ bidirectional-rpc|bidirectional]] (no privileged client / server role at the protocol layer). First production consumer: Wrangler remote bindings (GA 2025-09-16, ~6 days before announcement) — localworkerdtest instance speaks RPC to production services. Positions Cap'n Web as a GraphQL alternative for the waterfall problem — flattens dependent calls at the RPC layer instead of introducing a new query language. Novel.map()over promised arrays via record-replay DSL — client executes the user's synchronous callback once against aProxyplaceholder, records the speculative pipelined calls, ships the recording to the server to replay per array element; "the DSL used to represent instructions for the map function is just the RPC protocol itself." Second innovation: the capability-returning-authenticate pattern —authenticate(apiKey)returns anAuthenticatedSessionstub rather than setting connection state or returning a bearer token; structurally impossible to call protected methods without authenticating first because you wouldn't have the object to call them on. Canonical wiki instance of concepts/object-capability-rpc, [[concepts/ promise-pipelining]], concepts/bidirectional-rpc, patterns/capability-returning-authenticate, and patterns/record-replay-dsl. Experimental / early adopter territory. 643 HN points (2025-09-22) - sources/2025-07-16-cloudflare-1111-incident-on-july-14-2025 —
Post-mortem on the 62-minute global outage of the
systems/cloudflare-1-1-1-1-resolver|1.1.1.1 Resolver on
2025-07-14 (21:52–22:54 UTC). Root cause not an attack, not
a BGP hijack — internal configuration error. On 2025-06-06 a
release on the legacy service-
topology system configured a new (not-yet-live)
DLS service and
accidentally linked the Resolver's 11 IP prefixes to it;
dormant for 38 days with no alerts because the DLS service
wasn't routing traffic. 07-14 trigger: attaching an offline
test location to that same non-production DLS topology caused a
global config refresh, which evaluated the latent link, shrunk
the Resolver's topology from "all locations" to "one
offline location", and triggered a global
BGP withdrawal of all 11
prefixes. UDP/TCP/DoT traffic collapsed; DoH-via-hostname
(
cloudflare-dns.com) mostly survived (different IP set). Alerts fired 13 minutes in; revert at 22:20 UTC re-announced BGP near-instantly (~77% recovery) but ~23% of edge servers had lost IP bindings and had to go back through change- management (normally multi-hour progressive rollout, accelerated after testing-location validation). Concurrent non-causal BGP hijack of 1.1.1.0/24 by Tata Communications (AS4755) at 21:54 — visible only because Cloudflare had withdrawn. Root structural cause: the legacy addressing system lacks progressive deployment; remediation is to accelerate deprecation of the legacy system in the ongoing legacy→strategic migration. Canonical wiki instance of latent misconfig, self-inflicted BGP withdrawal, and the absence-of-pattern instance of progressive config rollout. 581 HN points (2025-07-16) - sources/2026-01-19-cloudflare-what-came-first-the-cname-or-the-a-record
— "What came first: the CNAME or the A record?" Post-mortem
on the ~135-minute partial outage of
systems/cloudflare-1-1-1-1-resolver|1.1.1.1 on 2026-01-08
(17:40–19:55 UTC; 47 min severe impact before revert start).
Second 1.1.1.1 outage of the six-month arc following the
2025-07-14 incident, structurally similar (latent defect shipping
fleet-wide through gates that didn't observe it) but defect class
changes from config-link to code refactor: a memory-
optimisation in
PartialChain::fill_cachechangedVec::with_capacity(…) + extend_from_slice(CNAMEs) + extend_from_slice(A)toentry.answer.extend(CNAMEs)— making CNAMEs appear after the A records they aliased instead of before. Most DNS clients handle either order, but a subset of widely-deployed stub resolvers — notably glibcgetaddrinfo(viagetanswer_r's single-pass expected-name parse) and the DNSC process in three Cisco Catalyst switch models — failed to resolve affected hostnames, and the Cisco switches crashed and entered reboot loops (see Cisco CSCvv99999 advisory). systems/systemd-resolved was unaffected because it parses records into an ordered set first and searches the full set. Structural cause: 40-year-old ambiguity in RFC 1034 (1987), which pre-dates RFC 2119's MUST/SHOULD conventions (1997) by 10 years. The word "preface" in §4.3.1 reads as a constraint but is non-normative; §3.6 explicitly says "the order of RRs in a set is not significant" but the example only covers intra-RRset ordering — the spec never specifies inter-RRset ordering between different RRsets in the same message section (which is what a CNAME chain is). RFC 4035 (DNSSEC) uses explicit MUST for RRSIG inclusion; the ordering axis Cloudflare proposes to formalise goes through draft-jabley-dnsop-ordered-answer-section at IETF DNSOP. Timeline: 2025-12-02 commit → 2025-12-10 test env → 2026-01-07 23:48 UTC global release → 2026-01-08 17:40 UTC 90 % fleet (impact onset) → 18:19 UTC incident declared (+39 min) → 18:27 UTC revert (+8 min) → 19:55 UTC full impact end. Remediation: keep CNAMEs-first forever; add tests asserting the ordering invariant ("the team had originally implemented CNAME-first but had no test, which is why the optimisation silently broke it") — canonical instance of patterns/test-the-ambiguous-invariant; file Internet-Draft. Makes 1.1.1.1 the canonical wiki instance of anycast-scale services failing from within through latent defects that pre-deployment gates don't catch — twice in six months, from two different classes. Instance of patterns/fast-rollback (8 min from declaration to revert-start) and patterns/staged-rollout (pattern-present-but-defect-invisible — glibc-getaddrinfo consumers are small and POP-uncorrelated so no pre-90 % checkpoint flagged). 466 HN points (2026-01-19) - sources/2025-06-20-cloudflare-how-cloudflare-blocked-a-monumental-7-3-tbps-ddos-attack — Cloudflare discloses fully-autonomous mitigation of a 7.3 Tbps / 4.8 Bpps / 37.4 TB in 45 seconds DDoS attack (largest ever reported) against a Magic Transit hosting-provider customer; 99.996% UDP flood + long tail of QOTD / Echo / NTP / Portmap / RIPv1 / Mirai reflection-amplification; 122,145 source IPs across 5,433 ASes / 161 countries; 477 data centres across 293 locations participated, no human intervention / alerts / incidents. Architecture: anycast delivery + XDP/eBPF kernel drop plane + dosd user-space fingerprinting + gossip cross-POP propagation. 265 HN points (2025-06-20)
- sources/2025-07-01-cloudflare-pay-per-crawl — Cloudflare launches Pay Per Crawl (private beta) letting publishers monetize AI-crawler access at flat per-request USD prices via HTTP 402 Payment Required
crawler-price/crawler-max-price/crawler-exact-price/crawler-chargedheaders. Three publisher outcomes per crawler: Allow / Charge / Block (Charge-for-no-billing-relationship functionally 403 but advertises a future offer). Bot identity via Web Bot Auth (Ed25519 + JWK directory + per-request RFC 9421 HTTP Message Signatures). Two flows: reactive (2RT) and preemptive (1RT). Rules engine runs after WAF + bot management. Cloudflare is the Merchant of Record aggregating billing across N publishers and M crawlers. Stated end-state: the agentic paywall where AI agents with user-granted budgets pay programmatically per resource. 569 HN points (2025-07-01)- sources/2025-08-04-cloudflare-perplexity-stealth-undeclared-crawlers
— Perplexity stealth-crawler investigation. Cloudflare
documents Perplexity AI running an
undeclared stealth crawler (generic
Chrome/124.0.0.0UA, IPs outside the published range, rotating ASNs, ignoresrobots.txt) that activates in response to origin-side blocks of Perplexity's declared crawlers PerplexityBot + Perplexity-User. Controlled methodology — brand-new never-registered domains published a blanketDisallow, Cloudflare then asked Perplexity about content on them; detailed answers proved direct fetch in violation. Canonical wiki instance of patterns/brand-new-domain-experiment. Volume: 3-6 M stealth requests/day vs 20-25 M declared requests/day across tens of thousands of domains. Enforcement stack: ML + network-signal fingerprinting produces a stealth signature that survives UA + IP + ASN rotation (patterns/stealth-crawler-detection-fingerprint + concepts/ml-bot-fingerprinting) → shipped as block signatures in the managed AI-bots ruleset for all customers including free tier → Cloudflare de-lists Perplexity from Verified Bots (patterns/verified-bot-delisting). Positive control: ChatGPT-User fetchedrobots.txt, honored theDisallow, stopped crawling, no follow-up attempts from alternate UAs; ChatGPT Agent signs via Web Bot Auth. Introduces the five operator norms (transparent; well-behaved netizens; clear purpose; separate bots for separate activities; follow the rules). Companion post to pay-per-crawl (the monetization half) and Moving past bots vs. humans (the 2026 framing half): this 2025-08-04 post is the enforcement precedent that makes both possible. Introduces systems/perplexity-ai, systems/perplexitybot, systems/perplexity-user, systems/chatgpt-user, systems/cloudflare-bot-management; concepts/stealth-crawler, concepts/declared-crawler, concepts/undeclared-crawler, concepts/user-agent-rotation, concepts/asn-rotation, concepts/ip-rotation-as-evasion, concepts/robots-txt-compliance, concepts/ml-bot-fingerprinting; patterns/brand-new-domain-experiment, patterns/stealth-crawler-detection-fingerprint, patterns/stealth-on-block-fallback, patterns/verified-bot-delisting. Extends systems/web-bot-auth (enforcement-precedent seen-in), systems/pay-per-crawl (enforcement-precondition seen-in), concepts/robots-txt (advisory-protocol failure mode seen-in), concepts/verified-bots (delisting enforcement-lever seen-in). 1,315 HN points (2025-08-04) - sources/2026-04-20-cloudflare-internal-ai-engineering-stack — Cloudflare's internal AI dev stack: proxy Worker + AI Gateway + Workers AI, MCP Server Portal w/ Code Mode, Backstage-backed AGENTS.md, multi-agent AI Code Reviewer (2026-04-20)
- sources/2026-01-29-cloudflare-moltworker-self-hosted-ai-agent — Cloudflare ports Moltbot (self-hosted personal AI agent, later renamed OpenClaw) onto the Developer Platform as Moltworker, a proof-of-concept middleware Worker threading five primitives: Sandbox SDK for the isolated container runtime, Browser Rendering via a thin CDP proxy
- injected skill for headless Chromium
(patterns/cdp-proxy-for-headless-browser),
AI Gateway via
ANTHROPIC_BASE_URLwith zero Moltbot code changes (patterns/ai-gateway-provider-abstraction - concepts/byok-bring-your-own-key),
R2 via
sandbox.mountBucket()presenting an R2 bucket as a filesystem partition inside the otherwise- ephemeral container (patterns/mountable-persistent-storage), and Zero Trust Access for JWT-enforced auth on the Admin UI + API endpoints (patterns/middleware-worker-adapter). Quantitative platform claim: of the top 1,000 NPM packages, after excluding build/CLI/browser-only, only 15 (1.5%) genuinely don't work natively on Workers. Framed as reference architecture for porting an existing Docker-based agent onto the Developer Platform. Companion to sources/2026-04-20-cloudflare-internal-ai-engineering-stack at the enterprise-scale dogfood tier. 246 HN points (2026-01-29) - sources/2025-03-25-cloudflare-opkssh-open-sourcing —
Cloudflare open-sources OPKSSH (acquired from BastionZero 2024)
under the OpenPubkey Linux Foundation umbrella: OIDC-SSO-backed
SSH with 24h ephemeral keys, identity-based server ACL, zero
SSH-protocol / client / server-binary changes — only two lines
in
sshd_configpointingAuthorizedKeysCommandat the OpenPubkey verifier. PK Token rides through an SSH certificate extension field. 276 HN points (2025-03-25) -
sources/2024-09-10-cloudflare-a-good-day-to-trie-hard — CPU- share micro-optimization of pingora-origin's
clear_internal_headershelper (1.71 % → 0.43 % = ~550 cores saved on a 40,000-core fleet) via lookup-direction flip then a custom trie (trie-hard) optimized for >90 %-miss short-string static-set membership checks; flame-graph-driven, criterion- benchmarked, production-stack-sampling-verified (2024-09-10) -
sources/2026-06-10-cloudflare-route-public-traffic-to-private-applications — 2026-06-10 Route public traffic to private applications with Cloudflare. Launches Application Services for Private Origins (closed beta, Enterprise; GA Q4 2026) — extends the full L7 stack to private-network-only origins via a unified connectivity layer shared across HTTP proxy, Spectrum L4, and Workers VPC. No connector software required for WAN/Mesh customers. Introduces systems/cloudflare-private-origins, concepts/public-to-private-routing, concepts/unified-connectivity-layer, patterns/private-origin-dns-toggle, patterns/unified-application-services-for-all-origins.
-
sources/2026-06-09-cloudflare-defend-against-frontier-cyber-models — Cloudflare publishes its full internal security architecture as a customer-zero reference for defending against frontier AI cyber models. Core thesis: architecture around the vulnerability > speed of patch. Stacks WAF + ML scoring (concepts/waf-attack-score) + positive security model (API Shield) + bot detection + zero-trust access (Require Access Protection) + AI Gateway + MCP Server Portal. Introduces systems/cloudforce-one (threat intel team), concepts/architecture-over-patching, concepts/asymmetric-attacker-defender-gap, concepts/positive-security-model, patterns/continuous-red-team-validation, patterns/ml-anomaly-scoring-over-signature-matching, patterns/require-access-before-reachability, patterns/positive-security-model-api-validation. Operational: sub-30s global rule deployment; ~20% of web traffic as intel substrate; React2Shell rule live before CVE advisory. Sibling to the Code Orange post (sources/2026-05-01-cloudflare-code-orange-fail-small-complete) — Code Orange covers deployment safety; this post covers security architecture; both embody customer-zero + public-architecture-publication posture. (2026-06-09)