Shared Dictionaries: compression that keeps up with the agentic web¶
Summary¶
Cloudflare's 2026-04-17 post announces an open beta opening April 30, 2026 for shared compression dictionaries support on its edge, governed by RFC 9842: Compression Dictionary Transport. The mechanism turns the previously cached version of a resource into the compression dictionary for the next version — so when a team ships a one-line fix to a 272 KB JS bundle, only the few-KB diff goes on the wire instead of the full re-gzipped bundle. Framed as the compression response to two colliding 2026 trends: (1) web pages grow 6-9 % heavier per year per the HTTP Archive, (2) agentic actors were ~10 % of total Cloudflare requests in March 2026, up ~60 % year-over-year, while AI-assisted development compresses the deploy interval so bundler re-chunking (new filenames on every push) re-triggers full re-downloads for every user and every bot.
The post is structured as a three-phase rollout roadmap: Phase 1
passthrough (in active development, open beta 2026-04-30) forwards
the RFC 9842 headers (Use-As-Dictionary, Available-Dictionary)
and encodings (dcb, dcz) without modification, extends cache
keys to vary on Available-Dictionary + Accept-Encoding, and
leaves dictionary lifecycle on the customer origin; Phase 2
managed dictionaries has Cloudflare inject headers, store
dictionary bytes, delta-compress on the fly, and serve the right
variant per client; Phase 3 automatic dictionaries has
Cloudflare's network auto-detect versioned-resource URL patterns
and generate dictionaries without customer configuration. The post
pairs this with a lab test: same 272 KB JS bundle compresses
to 92.1 KB with gzip (66 %) or 2.6 KB with DCZ-against-previous-
version (97 % reduction over gzip), and the
canicompress.com live demo ships a
new ~94 KB bundle every minute where the diff compresses to
~159 bytes — 99.5 % reduction over gzip.
Also reviewed: the 2008-2017 Google SDCH experiment, which worked at double-digit page-load improvements for early adopters but was un-shipped from Chrome in 2017 after accumulating compression side-channel attacks (CRIME / BREACH) plus same-origin-policy violations and irreconcilable CORS semantics. RFC 9842 closes these gaps (advertised dictionary usable only on same-origin responses) and Chrome 130+ / Edge 130+ have shipped support with Firefox tracking. Canonical wiki framing: this post introduces shared-dictionary compression and edge-managed protocol complexity as the CDN answer to a standard whose dictionary-lifecycle + cache-variant story is too hairy for individual origins to implement, and the passthrough → managed → auto phased rollout as the canonical shape for introducing protocol-level CDN features where the pain gradient runs from "advanced customers BYO" through "CDN does the work" to "CDN does it without being asked".
Key takeaways¶
- "Compression has to get smarter" as the agentic-web headline claim. Pages grow 6-9 %/year on page weight per the HTTP Archive Web Almanac; agentic actors were ~10 % of all Cloudflare requests in March 2026, up ~60 % year-over-year (footnote: bots ~31.3 % of all HTTP; AI ~29-30 % of bot traffic); AI-assisted coding compresses the deploy interval so bundler re-chunking invalidates filenames on every push. The three trends compound: heavier pages × more clients × more deploys = more redundant bytes on the wire on every cycle. Traditional per-response compression (gzip / Brotli / Zstandard) doesn't know the client already has 95 % of the payload cached. "Ship ten small changes a day, and you've effectively opted out of caching." (Source: this article)
- Previous cached version as the dictionary — delta
compression via RFC 9842. First request: server attaches
Use-As-Dictionaryheader → browser retains the resource as a dictionary. Next request for same URL class: browser sendsAvailable-Dictionary: <hash>→ server compresses the new version against the old, returnsContent-Encoding: dcb(delta-compressed Brotli) ordcz(delta-compressed Zstandard), and the wire only carries the diff. No separate dictionary file: the dictionary is the previously-cached resource. App.bundle.v1.js cached → app.bundle.v2.js compressed against v1 → v3 against v2 → v47 against v46. "The savings don't reset, they persist across the entire release history." (Source: this article) - Lab test: 272 KB → 92.1 KB gzip → 2.6 KB DCZ (97 % over already-compressed). Two near-identical JS bundles representing successive deploys. Uncompressed asset 272 KB; gzip 92.1 KB (66 % reduction from raw); shared-dictionary DCZ with v1 as dictionary 2.6 KB (97 % reduction over gzip). TTFB on cache miss (compressing against origin dictionary) ~20 ms slower than gzip — "near-negligible for transmission." Download completion: cache miss 31 ms DCZ vs 166 ms gzip (81 % improvement); cache hit 16 ms vs 143 ms (89 %). "The response is so much smaller that even when you pay a slight penalty at the start, you finish far ahead." (Source: this article)
- canicompress.com live demo: 99.5 % reduction at one-minute
deploy cadence. New ~94 KB bundle deployed every minute on
https://canicompress.com/, "mimic[king] a typical production single page application bundle"; bulk of code static, only a small config block changes per deploy — mirrors real-world deploys where "most of the bundle is unchanged framework and library code." First deploy: edge stores v1 as dictionary. Subsequent deploys: browser sends hash of v(n-1) → edge delta-compresses v(n). Result on the wire: ~159 bytes. 99.5 % reduction over gzip (and 99.8 % over raw). The demo ships walkthroughs for verifying the compression ratios via curl or browser DevTools. (Source: this article) - Three-phase rollout — passthrough → managed → auto.
Phase 1 (passthrough, active development, open beta
2026-04-30): Cloudflare forwards
Use-As-Dictionary+Available-Dictionaryheaders +dcb/dczencodings unmodified; cache keys extended to vary onAvailable-Dictionary+Accept-Encoding; customer origin owns dictionary generation + compression. Requirements to use: Cloudflare zone feature-enabled + origin serves dictionary-compressed responses with correct headers + client browser advertisesdcb/dczinAccept-Encoding(Chrome 130+ / Edge 130+ today, Firefox in progress). Phase 2 (managed dictionaries, unscheduled): customer sets a rule naming which assets serve as dictionaries; Cloudflare injects the headers, stores the dictionary bytes, delta-compresses new versions against cached old ones, serves the right variant per client; origin serves normal uncompressed / gzip / brotli responses. Phase 3 (automatic dictionaries, unscheduled): no customer configuration — Cloudflare's network observes URL patterns where successive responses share most content but differ by filename-hash, infers versioning, auto-stores previous version as dictionary, auto-delta-compresses successors. (Source: this article) - "This is a coordination problem that belongs at the edge."
Origin-side implementation is complex: generate dictionaries,
serve them with right headers, match every request against
Available-Dictionaryon the hot path, delta-compress on the fly, fall back gracefully when client has no dictionary, manage cache variants (responses vary on both encoding and dictionary hash — every dictionary version creates a separate cache variant), handle mid-deploy client-population splits (some clients on old dictionary, some on new, some with none — cache hit rates drop, storage climbs). "A CDN already sits in front of every request, already manages compression, and already handles cache variants." Canonical edge-managed protocol-complexity framing: the origin could do this — a reference implementation exists as Patrick Meenan's dictionary-worker (RFC 9842 author; runs full dictionary lifecycle inside a Cloudflare Worker using WASM-compiled Zstandard) — but the coordination cost is high enough that moving it to the edge makes shared-dictionary compression accessible rather than a specialist's game. (Source: this article) - SDCH (2008-2017) is the cautionary precedent; RFC 9842 closes the gaps. Google shipped Shared Dictionary Compression for HTTP (SDCH) in Chrome in 2008 with "double-digit improvements in page load times" at early adopters. Accumulated problems: compression side-channel attacks (CRIME 2012, BREACH 2013) where attackers injected content alongside a session cookie / token, watched compressed-output size shrink byte-by-byte as guesses matched, and extracted secrets; same-origin-policy violations (SDCH's cross-origin dictionary model ironically powered its performance but couldn't be reconciled with CORS); Cache-API specification gaps. Chrome un-shipped SDCH in 2017. RFC 9842 closes the key gaps — most critically, advertised dictionary is only usable on responses from the same origin, preventing many of the side-channel conditions. (Source: this article + linked Wikipedia / Chromium blink-dev / RFC sources)
- Phase 3's "how does the edge safely generate dictionaries automatically" is the hard research question. Cloudflare flags the hard parts explicitly: "Safely generating dictionaries that avoid revealing private data and identifying traffic for which dictionaries will offer the most benefit are real engineering problems." The composition of enablers: the edge sees the traffic patterns across millions of sites + billions of requests + every new deploy (pattern-detection input); the edge manages the cache layer where dictionaries need to live (storage-co-location advantage); the RUM beacon gives a validation loop to confirm a dictionary actually improves compression before committing to serve it. This is the canonical RUM- validated dictionary selection shape: pattern-detection → candidate dictionary → shadow-validate against RUM-observed compression-ratio lift → promote to serving only if lift is real. (Source: this article)
- Customer motivation = "millions of zones that would never have had the engineering time to implement custom dictionaries manually." The performance + bandwidth savings are the why, but Cloudflare explicitly targets accessibility — "This is what makes shared dictionaries accessible to everyone using Cloudflare". Phase 1 serves advanced customers with their own origin implementation; Phase 2 serves customers who know which assets are versioned but don't want to build the lifecycle; Phase 3 serves the long tail of zones that would never ship a custom implementation. Same three-phase accessibility gradient as other Cloudflare CDN primitives (DDoS mitigation, bot management, HTTP/3) where Phase 1 is "customer can do this manually", Phase 2 is "customer sets a rule", Phase 3 is "enabled by default, no customer action". (Source: this article)
- "Compression with a memory" — framing shift. For most of the web's history, compression was stateless (every response compressed as if the client had never seen anything before). Shared dictionaries give compression a memory. "That matters more now than it would have five years ago" because agentic coding tools compress the deploy interval + drive an increasing share of traffic — more redundant bytes per transfer + more transfers that shouldn't need to happen. Delta compression reduces both sides of that equation. "Agents are gaining more context and becoming surgical in their code changes" — the diff between successive deploys is shrinking, so the compression ratio against the previous version keeps climbing. (Source: this article)
Architecture¶
Dictionary-transport handshake (RFC 9842)¶
Request 1 (first view):
Browser → Server: GET /app.bundle.v1.js
Accept-Encoding: gzip, br, zstd, dcb, dcz
Server → Browser: 200 OK
Content-Encoding: gzip (or br)
Use-As-Dictionary: match="/app.bundle.*.js",
id="sha-256:abc..."
[response body]
Browser caches body + records dictionary-id association with
match-pattern.
Request 2 (next deploy, v2 bundle):
Browser → Server: GET /app.bundle.v2.js
Accept-Encoding: gzip, br, zstd, dcb, dcz
Available-Dictionary: sha-256:abc...
Server → Browser: 200 OK
Content-Encoding: dcz (delta-Zstandard)
[diff-against-v1 body — few KB]
Browser decompresses diff using cached v1 as dictionary →
reconstructs full v2 body.
Cache-variant keys in Phase 1 passthrough¶
Cache key extended with both:
Accept-Encoding (gzip vs br vs zstd vs dcb vs dcz)
Available-Dictionary (which dictionary hash client has)
Implication:
Mid-deploy with clients on v1-dict, v2-dict, no-dict and
browsers advertising various Accept-Encoding → multiple
cache variants per URL; edge stores each independently;
dictionaries themselves cached per normal HTTP caching rules.
Three-phase rollout topology¶
Phase 1 passthrough (2026-04-30 beta):
Origin owns: dictionary generation, Use-As-Dictionary header,
delta compression, fallback
CF owns: header forwarding, cache-key extension,
encoding preservation
Customer fit: advanced, already runs custom dictionary logic
(e.g. pmeenan/dictionary-worker pattern)
Phase 2 managed dictionaries (unscheduled):
Customer: rule naming which assets are dictionaries
Origin: serves normal responses (no dictionary logic)
CF owns: Use-As-Dictionary injection, dictionary-byte
storage, on-the-fly delta compression,
per-client variant selection
Phase 3 automatic dictionaries (unscheduled):
Customer: nothing
CF owns: URL-pattern detection (hash-only-changes signal),
auto-dictionary storage, auto-delta-compress,
RUM-validated selection (only serve if real lift
measured on client), safety (avoid leaking
private data across responses)
Operational numbers disclosed¶
| Metric | Value | Context |
|---|---|---|
| Bundle uncompressed | 272 KB | Lab test: two near-identical JS bundles |
| Bundle gzip | 92.1 KB | 66 % reduction from raw |
| Bundle DCZ with v1 as dictionary | 2.6 KB | 97 % reduction over gzip |
| TTFB penalty on cache miss (DCZ vs gzip) | ~20 ms | "near-negligible" |
| Download: cache miss DCZ vs gzip | 31 ms vs 166 ms | 81 % improvement |
| Download: cache hit DCZ vs gzip | 16 ms vs 143 ms | 89 % improvement |
| canicompress.com bundle size | ~94 KB | Demo SPA, new deploy every minute |
| canicompress.com diff on wire | ~159 bytes | 99.5 % reduction over gzip |
| Hypothetical daily transfer (100K users × 10 deploys) | 500 GB → few hundred MB | "ten small changes a day" redundancy elimination |
| Page weight growth rate | 6-9 % per year | HTTP Archive Web Almanac 2024 |
| Agentic-actor share of Cloudflare traffic | ~10 % | March 2026; up ~60 % YoY |
| Bots share of HTTP requests | ~31.3 % | Cloudflare Radar 28d, footnote |
| AI share of bot traffic | ~29-30 % | Cloudflare Radar bots by category |
| Client browser support | Chrome 130+, Edge 130+ | Firefox in progress (bugzilla 1882979) |
| Phase 1 open-beta date | 2026-04-30 | Hard date in post |
Caveats¶
- Phase 1 beta only covers passthrough. At 2026-04-30 launch customers still need to generate dictionaries + compress at origin; Cloudflare only forwards headers + keys the cache correctly. The "CF does the work for you" promise (Phase 2) and "CF does it without being asked" promise (Phase 3) are both unscheduled in this post.
- Browser-support gap. Only Chrome 130+ and Edge 130+
advertise
dcb/dczinAccept-Encodingtoday; Firefox is tracking but not yet shipped; Safari not mentioned. Users on browsers without support fall back to gzip / br — they get no benefit but also no breakage. - Cache-variant explosion is real. Phase 1 passthrough keys
cache on both
Accept-EncodingandAvailable-Dictionary, so mid-deploy an edge POP can hold multiple cache variants of the same URL (gzip, br, zstd, dcz-against-v1, dcz-against-v2, raw). Storage climbs; hit rate on any one variant drops. The post names this "coordination problem" as the reason the complexity belongs at the CDN — but it doesn't disappear, it just moves. - Phase 3 safety is the open research problem. "Safely generating dictionaries that avoid revealing private data" — if two users' responses share content (session-specific tokens, per-user CSRF, per-tenant data in the same URL class), auto- dictionary generation could leak private data across responses or across users. Cloudflare flags this as a "real engineering problem" without disclosing the solution.
- Hash-only-changes signal doesn't always mean versioned-bundle. Phase 3's URL-pattern detection heuristic — "successive responses share most of their content but differ by hash" — is the right first-order signal for bundler-produced versioned assets, but will false-positive on hash-based URL schemes where content is supposed to differ (content-addressed storage, per-request rendered content, short-lived signed URLs). False-positives mean wasted storage + wasted compression CPU; false-negatives mean missed savings.
- Demo numbers are best-case. canicompress.com's 99.5 % reduction is on a bundle where "only a small configuration block changes" between deploys — the ideal case. The lab-test 97 % reduction on 272 KB with "a few localized changes" is closer to real-world but still a curated example. Real deploys (dependency bumps, refactors, feature work) produce larger diffs and smaller compression ratios; Cloudflare's own disclaimer: "results will vary based on the actual delta between the dictionary and the asset."
- CRIME / BREACH class attacks not fully retired. RFC 9842's same-origin constraint closes the cross-origin attack surface, but same-origin compression side channels remain possible when sensitive data (auth tokens, CSRF tokens) is compressed alongside attacker-injected content on the same origin. The post frames RFC 9842 as mitigating the side-channel class, not eliminating it, and doesn't enumerate the residual risk.
- Origin-implementation bar is high without a CDN. The post's framing of complexity (generate + serve + match + compress + fallback + cache-variants) is accurate for a zone that doesn't front Cloudflare. Non-CDN-fronted origins implementing RFC 9842 solo need to ship all of this themselves. This is by design — the post is an argument for why you should use the CDN for this — but should be named as such.
- No disclosure on how Phase 1 passthrough interacts with
Cloudflare's existing compression (auto-minify, Brotli, Zstd).
Presumably passthrough means "don't strip, don't recompress" —
the post says that — but edge cases (origin says
dczbut client didn't advertise it, origin serves dictionary bytes without correct Vary headers, existing Cloudflare compression rules running alongside) aren't enumerated. - No pricing / volume disclosure. The post doesn't say whether shared-dictionary cache variants count against any tier's storage or bandwidth budgets; whether Phase 2 managed dictionaries will be metered separately from base CDN; whether Phase 3 auto-detection runs on all plans or requires specific tiers.
Source¶
- Original: https://blog.cloudflare.com/shared-dictionaries/
- Raw markdown:
raw/cloudflare/2026-04-17-shared-dictionaries-compression-that-keeps-up-with-the-agent-d7120e7b.md
Related¶
- companies/cloudflare — publisher.
- systems/cloudflare-shared-dictionaries — the feature (passthrough + managed + automatic phases).
- systems/rfc-9842-compression-dictionary-transport — IETF standard the feature implements; closes SDCH's same-origin gap.
- systems/sdch — 2008-2017 Google precedent that accumulated problems (CRIME/BREACH, SOP violations, CORS irreconcilability) and was un-shipped from Chrome in 2017.
- systems/dictionary-worker — Patrick Meenan's (RFC 9842 author) Cloudflare-Worker reference implementation — WASM-compiled Zstandard running the full dictionary lifecycle at the edge; demonstrates the origin-side cost that motivates moving the work to the CDN.
- concepts/shared-dictionary-compression — the core concept (previously-cached-version-as-dictionary).
- concepts/delta-compression-http — HTTP-level delta compression (dcb / dcz encodings) as distinct from Git delta compression at repo-storage layer.
- concepts/compression-side-channel-attack — CRIME/BREACH/SDCH attack class.
- concepts/cache-variant-explosion — cartesian-product explosion of cache variants when cache key includes both Accept-Encoding + Available-Dictionary + dictionary-version.
- concepts/same-origin-dictionary-scope — RFC 9842 constraint that dictionaries are only usable within the origin that served them.
- concepts/bundler-chunk-invalidation — root cause for why one-line code changes re-download entire bundles: bundler re-chunks, emits new filenames, client sees new URL, caching opts out.
- concepts/deploy-frequency-vs-caching — tension between product velocity + caching effectiveness.
- concepts/agentic-traffic-share — ~10 % of Cloudflare requests in March 2026, up ~60 % YoY.
- patterns/edge-managed-protocol-complexity — CDN-absorbs- protocol-implementation-complexity pattern exemplified by Cloudflare's shared-dictionary rollout.
- patterns/phased-cdn-rollout-passthrough-managed-auto — passthrough → managed → auto rollout shape as the canonical CDN-protocol-feature launch curve.
- patterns/previous-version-as-dictionary — specific pattern: use v(n-1) of the same resource as the dictionary for v(n).
- patterns/rum-validated-dictionary-selection — RUM-beacon- confirms-compression-lift-before-promotion, Phase 3 safety mechanism.
- sources/2026-04-17-cloudflare-agents-week-network-performance-update — same-week network-performance post; shares the "agentic share is climbing" framing and RUM-beacon as Cloudflare's measurement substrate.
- sources/2026-01-19-cloudflare-what-came-first-the-cname-or-the-a-record — same-shape "CDN absorbs protocol complexity the origin doesn't want to deal with" framing at the DNS layer.
- concepts/git-delta-compression — delta compression at the Git-repo-pack-file layer (Git chooses candidate pairs via path heuristic; Cloudflare chooses via previous-version-is- dictionary). Sibling mechanism, different layer.