Skip to content

Shared Dictionaries: compression that keeps up with the agentic web

Summary

Cloudflare's 2026-04-17 post announces an open beta opening April 30, 2026 for shared compression dictionaries support on its edge, governed by RFC 9842: Compression Dictionary Transport. The mechanism turns the previously cached version of a resource into the compression dictionary for the next version — so when a team ships a one-line fix to a 272 KB JS bundle, only the few-KB diff goes on the wire instead of the full re-gzipped bundle. Framed as the compression response to two colliding 2026 trends: (1) web pages grow 6-9 % heavier per year per the HTTP Archive, (2) agentic actors were ~10 % of total Cloudflare requests in March 2026, up ~60 % year-over-year, while AI-assisted development compresses the deploy interval so bundler re-chunking (new filenames on every push) re-triggers full re-downloads for every user and every bot.

The post is structured as a three-phase rollout roadmap: Phase 1 passthrough (in active development, open beta 2026-04-30) forwards the RFC 9842 headers (Use-As-Dictionary, Available-Dictionary) and encodings (dcb, dcz) without modification, extends cache keys to vary on Available-Dictionary + Accept-Encoding, and leaves dictionary lifecycle on the customer origin; Phase 2 managed dictionaries has Cloudflare inject headers, store dictionary bytes, delta-compress on the fly, and serve the right variant per client; Phase 3 automatic dictionaries has Cloudflare's network auto-detect versioned-resource URL patterns and generate dictionaries without customer configuration. The post pairs this with a lab test: same 272 KB JS bundle compresses to 92.1 KB with gzip (66 %) or 2.6 KB with DCZ-against-previous- version (97 % reduction over gzip), and the canicompress.com live demo ships a new ~94 KB bundle every minute where the diff compresses to ~159 bytes — 99.5 % reduction over gzip.

Also reviewed: the 2008-2017 Google SDCH experiment, which worked at double-digit page-load improvements for early adopters but was un-shipped from Chrome in 2017 after accumulating compression side-channel attacks (CRIME / BREACH) plus same-origin-policy violations and irreconcilable CORS semantics. RFC 9842 closes these gaps (advertised dictionary usable only on same-origin responses) and Chrome 130+ / Edge 130+ have shipped support with Firefox tracking. Canonical wiki framing: this post introduces shared-dictionary compression and edge-managed protocol complexity as the CDN answer to a standard whose dictionary-lifecycle + cache-variant story is too hairy for individual origins to implement, and the passthrough → managed → auto phased rollout as the canonical shape for introducing protocol-level CDN features where the pain gradient runs from "advanced customers BYO" through "CDN does the work" to "CDN does it without being asked".

Key takeaways

  1. "Compression has to get smarter" as the agentic-web headline claim. Pages grow 6-9 %/year on page weight per the HTTP Archive Web Almanac; agentic actors were ~10 % of all Cloudflare requests in March 2026, up ~60 % year-over-year (footnote: bots ~31.3 % of all HTTP; AI ~29-30 % of bot traffic); AI-assisted coding compresses the deploy interval so bundler re-chunking invalidates filenames on every push. The three trends compound: heavier pages × more clients × more deploys = more redundant bytes on the wire on every cycle. Traditional per-response compression (gzip / Brotli / Zstandard) doesn't know the client already has 95 % of the payload cached. "Ship ten small changes a day, and you've effectively opted out of caching." (Source: this article)
  2. Previous cached version as the dictionary — delta compression via RFC 9842. First request: server attaches Use-As-Dictionary header → browser retains the resource as a dictionary. Next request for same URL class: browser sends Available-Dictionary: <hash> → server compresses the new version against the old, returns Content-Encoding: dcb (delta-compressed Brotli) or dcz (delta-compressed Zstandard), and the wire only carries the diff. No separate dictionary file: the dictionary is the previously-cached resource. App.bundle.v1.js cached → app.bundle.v2.js compressed against v1 → v3 against v2 → v47 against v46. "The savings don't reset, they persist across the entire release history." (Source: this article)
  3. Lab test: 272 KB → 92.1 KB gzip → 2.6 KB DCZ (97 % over already-compressed). Two near-identical JS bundles representing successive deploys. Uncompressed asset 272 KB; gzip 92.1 KB (66 % reduction from raw); shared-dictionary DCZ with v1 as dictionary 2.6 KB (97 % reduction over gzip). TTFB on cache miss (compressing against origin dictionary) ~20 ms slower than gzip — "near-negligible for transmission." Download completion: cache miss 31 ms DCZ vs 166 ms gzip (81 % improvement); cache hit 16 ms vs 143 ms (89 %). "The response is so much smaller that even when you pay a slight penalty at the start, you finish far ahead." (Source: this article)
  4. canicompress.com live demo: 99.5 % reduction at one-minute deploy cadence. New ~94 KB bundle deployed every minute on https://canicompress.com/, "mimic[king] a typical production single page application bundle"; bulk of code static, only a small config block changes per deploy — mirrors real-world deploys where "most of the bundle is unchanged framework and library code." First deploy: edge stores v1 as dictionary. Subsequent deploys: browser sends hash of v(n-1) → edge delta-compresses v(n). Result on the wire: ~159 bytes. 99.5 % reduction over gzip (and 99.8 % over raw). The demo ships walkthroughs for verifying the compression ratios via curl or browser DevTools. (Source: this article)
  5. Three-phase rollout — passthrough → managed → auto. Phase 1 (passthrough, active development, open beta 2026-04-30): Cloudflare forwards Use-As-Dictionary + Available-Dictionary headers + dcb/dcz encodings unmodified; cache keys extended to vary on Available-Dictionary + Accept-Encoding; customer origin owns dictionary generation + compression. Requirements to use: Cloudflare zone feature-enabled + origin serves dictionary-compressed responses with correct headers + client browser advertises dcb/dcz in Accept-Encoding (Chrome 130+ / Edge 130+ today, Firefox in progress). Phase 2 (managed dictionaries, unscheduled): customer sets a rule naming which assets serve as dictionaries; Cloudflare injects the headers, stores the dictionary bytes, delta-compresses new versions against cached old ones, serves the right variant per client; origin serves normal uncompressed / gzip / brotli responses. Phase 3 (automatic dictionaries, unscheduled): no customer configuration — Cloudflare's network observes URL patterns where successive responses share most content but differ by filename-hash, infers versioning, auto-stores previous version as dictionary, auto-delta-compresses successors. (Source: this article)
  6. "This is a coordination problem that belongs at the edge." Origin-side implementation is complex: generate dictionaries, serve them with right headers, match every request against Available-Dictionary on the hot path, delta-compress on the fly, fall back gracefully when client has no dictionary, manage cache variants (responses vary on both encoding and dictionary hash — every dictionary version creates a separate cache variant), handle mid-deploy client-population splits (some clients on old dictionary, some on new, some with none — cache hit rates drop, storage climbs). "A CDN already sits in front of every request, already manages compression, and already handles cache variants." Canonical edge-managed protocol-complexity framing: the origin could do this — a reference implementation exists as Patrick Meenan's dictionary-worker (RFC 9842 author; runs full dictionary lifecycle inside a Cloudflare Worker using WASM-compiled Zstandard) — but the coordination cost is high enough that moving it to the edge makes shared-dictionary compression accessible rather than a specialist's game. (Source: this article)
  7. SDCH (2008-2017) is the cautionary precedent; RFC 9842 closes the gaps. Google shipped Shared Dictionary Compression for HTTP (SDCH) in Chrome in 2008 with "double-digit improvements in page load times" at early adopters. Accumulated problems: compression side-channel attacks (CRIME 2012, BREACH 2013) where attackers injected content alongside a session cookie / token, watched compressed-output size shrink byte-by-byte as guesses matched, and extracted secrets; same-origin-policy violations (SDCH's cross-origin dictionary model ironically powered its performance but couldn't be reconciled with CORS); Cache-API specification gaps. Chrome un-shipped SDCH in 2017. RFC 9842 closes the key gaps — most critically, advertised dictionary is only usable on responses from the same origin, preventing many of the side-channel conditions. (Source: this article + linked Wikipedia / Chromium blink-dev / RFC sources)
  8. Phase 3's "how does the edge safely generate dictionaries automatically" is the hard research question. Cloudflare flags the hard parts explicitly: "Safely generating dictionaries that avoid revealing private data and identifying traffic for which dictionaries will offer the most benefit are real engineering problems." The composition of enablers: the edge sees the traffic patterns across millions of sites + billions of requests + every new deploy (pattern-detection input); the edge manages the cache layer where dictionaries need to live (storage-co-location advantage); the RUM beacon gives a validation loop to confirm a dictionary actually improves compression before committing to serve it. This is the canonical RUM- validated dictionary selection shape: pattern-detection → candidate dictionary → shadow-validate against RUM-observed compression-ratio lift → promote to serving only if lift is real. (Source: this article)
  9. Customer motivation = "millions of zones that would never have had the engineering time to implement custom dictionaries manually." The performance + bandwidth savings are the why, but Cloudflare explicitly targets accessibility — "This is what makes shared dictionaries accessible to everyone using Cloudflare". Phase 1 serves advanced customers with their own origin implementation; Phase 2 serves customers who know which assets are versioned but don't want to build the lifecycle; Phase 3 serves the long tail of zones that would never ship a custom implementation. Same three-phase accessibility gradient as other Cloudflare CDN primitives (DDoS mitigation, bot management, HTTP/3) where Phase 1 is "customer can do this manually", Phase 2 is "customer sets a rule", Phase 3 is "enabled by default, no customer action". (Source: this article)
  10. "Compression with a memory" — framing shift. For most of the web's history, compression was stateless (every response compressed as if the client had never seen anything before). Shared dictionaries give compression a memory. "That matters more now than it would have five years ago" because agentic coding tools compress the deploy interval + drive an increasing share of traffic — more redundant bytes per transfer + more transfers that shouldn't need to happen. Delta compression reduces both sides of that equation. "Agents are gaining more context and becoming surgical in their code changes" — the diff between successive deploys is shrinking, so the compression ratio against the previous version keeps climbing. (Source: this article)

Architecture

Dictionary-transport handshake (RFC 9842)

Request 1 (first view):
  Browser → Server: GET /app.bundle.v1.js
                    Accept-Encoding: gzip, br, zstd, dcb, dcz
  Server → Browser: 200 OK
                    Content-Encoding: gzip (or br)
                    Use-As-Dictionary: match="/app.bundle.*.js",
                                       id="sha-256:abc..."
                    [response body]

Browser caches body + records dictionary-id association with
match-pattern.

Request 2 (next deploy, v2 bundle):
  Browser → Server: GET /app.bundle.v2.js
                    Accept-Encoding: gzip, br, zstd, dcb, dcz
                    Available-Dictionary: sha-256:abc...
  Server → Browser: 200 OK
                    Content-Encoding: dcz   (delta-Zstandard)
                    [diff-against-v1 body — few KB]

Browser decompresses diff using cached v1 as dictionary →
reconstructs full v2 body.

Cache-variant keys in Phase 1 passthrough

Cache key extended with both:
  Accept-Encoding           (gzip vs br vs zstd vs dcb vs dcz)
  Available-Dictionary      (which dictionary hash client has)

Implication:
  Mid-deploy with clients on v1-dict, v2-dict, no-dict and
  browsers advertising various Accept-Encoding → multiple
  cache variants per URL; edge stores each independently;
  dictionaries themselves cached per normal HTTP caching rules.

Three-phase rollout topology

Phase 1 passthrough (2026-04-30 beta):
  Origin owns:   dictionary generation, Use-As-Dictionary header,
                 delta compression, fallback
  CF owns:       header forwarding, cache-key extension,
                 encoding preservation
  Customer fit:  advanced, already runs custom dictionary logic
                 (e.g. pmeenan/dictionary-worker pattern)

Phase 2 managed dictionaries (unscheduled):
  Customer:      rule naming which assets are dictionaries
  Origin:        serves normal responses (no dictionary logic)
  CF owns:       Use-As-Dictionary injection, dictionary-byte
                 storage, on-the-fly delta compression,
                 per-client variant selection

Phase 3 automatic dictionaries (unscheduled):
  Customer:      nothing
  CF owns:       URL-pattern detection (hash-only-changes signal),
                 auto-dictionary storage, auto-delta-compress,
                 RUM-validated selection (only serve if real lift
                 measured on client), safety (avoid leaking
                 private data across responses)

Operational numbers disclosed

Metric Value Context
Bundle uncompressed 272 KB Lab test: two near-identical JS bundles
Bundle gzip 92.1 KB 66 % reduction from raw
Bundle DCZ with v1 as dictionary 2.6 KB 97 % reduction over gzip
TTFB penalty on cache miss (DCZ vs gzip) ~20 ms "near-negligible"
Download: cache miss DCZ vs gzip 31 ms vs 166 ms 81 % improvement
Download: cache hit DCZ vs gzip 16 ms vs 143 ms 89 % improvement
canicompress.com bundle size ~94 KB Demo SPA, new deploy every minute
canicompress.com diff on wire ~159 bytes 99.5 % reduction over gzip
Hypothetical daily transfer (100K users × 10 deploys) 500 GB → few hundred MB "ten small changes a day" redundancy elimination
Page weight growth rate 6-9 % per year HTTP Archive Web Almanac 2024
Agentic-actor share of Cloudflare traffic ~10 % March 2026; up ~60 % YoY
Bots share of HTTP requests ~31.3 % Cloudflare Radar 28d, footnote
AI share of bot traffic ~29-30 % Cloudflare Radar bots by category
Client browser support Chrome 130+, Edge 130+ Firefox in progress (bugzilla 1882979)
Phase 1 open-beta date 2026-04-30 Hard date in post

Caveats

  • Phase 1 beta only covers passthrough. At 2026-04-30 launch customers still need to generate dictionaries + compress at origin; Cloudflare only forwards headers + keys the cache correctly. The "CF does the work for you" promise (Phase 2) and "CF does it without being asked" promise (Phase 3) are both unscheduled in this post.
  • Browser-support gap. Only Chrome 130+ and Edge 130+ advertise dcb/dcz in Accept-Encoding today; Firefox is tracking but not yet shipped; Safari not mentioned. Users on browsers without support fall back to gzip / br — they get no benefit but also no breakage.
  • Cache-variant explosion is real. Phase 1 passthrough keys cache on both Accept-Encoding and Available-Dictionary, so mid-deploy an edge POP can hold multiple cache variants of the same URL (gzip, br, zstd, dcz-against-v1, dcz-against-v2, raw). Storage climbs; hit rate on any one variant drops. The post names this "coordination problem" as the reason the complexity belongs at the CDN — but it doesn't disappear, it just moves.
  • Phase 3 safety is the open research problem. "Safely generating dictionaries that avoid revealing private data" — if two users' responses share content (session-specific tokens, per-user CSRF, per-tenant data in the same URL class), auto- dictionary generation could leak private data across responses or across users. Cloudflare flags this as a "real engineering problem" without disclosing the solution.
  • Hash-only-changes signal doesn't always mean versioned-bundle. Phase 3's URL-pattern detection heuristic — "successive responses share most of their content but differ by hash" — is the right first-order signal for bundler-produced versioned assets, but will false-positive on hash-based URL schemes where content is supposed to differ (content-addressed storage, per-request rendered content, short-lived signed URLs). False-positives mean wasted storage + wasted compression CPU; false-negatives mean missed savings.
  • Demo numbers are best-case. canicompress.com's 99.5 % reduction is on a bundle where "only a small configuration block changes" between deploys — the ideal case. The lab-test 97 % reduction on 272 KB with "a few localized changes" is closer to real-world but still a curated example. Real deploys (dependency bumps, refactors, feature work) produce larger diffs and smaller compression ratios; Cloudflare's own disclaimer: "results will vary based on the actual delta between the dictionary and the asset."
  • CRIME / BREACH class attacks not fully retired. RFC 9842's same-origin constraint closes the cross-origin attack surface, but same-origin compression side channels remain possible when sensitive data (auth tokens, CSRF tokens) is compressed alongside attacker-injected content on the same origin. The post frames RFC 9842 as mitigating the side-channel class, not eliminating it, and doesn't enumerate the residual risk.
  • Origin-implementation bar is high without a CDN. The post's framing of complexity (generate + serve + match + compress + fallback + cache-variants) is accurate for a zone that doesn't front Cloudflare. Non-CDN-fronted origins implementing RFC 9842 solo need to ship all of this themselves. This is by design — the post is an argument for why you should use the CDN for this — but should be named as such.
  • No disclosure on how Phase 1 passthrough interacts with Cloudflare's existing compression (auto-minify, Brotli, Zstd). Presumably passthrough means "don't strip, don't recompress" — the post says that — but edge cases (origin says dcz but client didn't advertise it, origin serves dictionary bytes without correct Vary headers, existing Cloudflare compression rules running alongside) aren't enumerated.
  • No pricing / volume disclosure. The post doesn't say whether shared-dictionary cache variants count against any tier's storage or bandwidth budgets; whether Phase 2 managed dictionaries will be metered separately from base CDN; whether Phase 3 auto-detection runs on all plans or requires specific tiers.

Source

Last updated · 178 distilled / 1,178 read