Skip to content

PATTERN Cited by 1 source

RUM-validated dictionary selection

RUM-validated dictionary selection is the pattern in which a CDN auto-selects candidate compression dictionaries from observed traffic patterns, then validates that the dictionary actually improves compression by measuring real user compression outcomes via the RUM beacon before committing to serve it. The RUM beacon acts as a closed-loop feedback signal: only dictionaries whose served variants produce measured compression wins stay in rotation.

Framed in Cloudflare's 2026-04-17 shared-dictionaries launch post as the safety mechanism for Phase 3 automatic dictionaries — the end-state where no customer configuration is needed and Cloudflare's network auto-detects which resources are versioned, auto-generates dictionaries, and auto-serves delta-compressed responses.

The shape

1. Pattern detection
   Observe URL classes across many sites, many requests,
   many deploys. Identify candidates: successive responses
   that "share most of their content but differ by hash" —
   strong signal that the resource is versioned.

2. Dictionary generation
   Store the previous version as a candidate dictionary.
   (See [patterns/previous-version-as-dictionary](<./previous-version-as-dictionary.md>).)

3. Shadow serving / small-sample serving
   Serve delta-compressed responses against the candidate
   dictionary to a subset of eligible clients.

4. RUM beacon measurement
   Browser RUM beacon reports the actual compression ratio
   observed, the actual download time, the actual success /
   failure rate. These are real-user numbers, not lab
   simulations.

5. Validation gate
   Only promote the dictionary to broad rotation if the RUM
   beacon confirms a real compression lift. If the gate
   fails (diff is large, compression ratio is mediocre,
   browser-support edge cases proliferate), drop the
   candidate and don't serve delta-compressed responses
   against it.

6. Continuous re-validation
   As customer content evolves, periodically re-validate
   that the current dictionary is still producing wins, and
   roll to a newer version when the older one degrades.

Why this particular loop matters

Three things come together:

  1. Traffic visibility — Cloudflare's network sees the traffic patterns across millions of sites and billions of requests per day. Pattern-detection input.
  2. Cache-layer co-location — Cloudflare already manages the cache layer where dictionaries need to live. Storage substrate is present.
  3. RUM beacon already deployed — the RUM beacon gives a validation loop that confirms a dictionary actually improves compression on real traffic before committing to serve it.

"The combination of traffic visibility, edge storage, and synthetic testing is what makes automatic generation feasible, though there are still many pieces to figure out." (Source: this article)

Why a validation loop is necessary

Auto-generating dictionaries from observed traffic is technically straightforward; what's hard is knowing the generated dictionary is safe + effective:

  • Safety: "Safely generating dictionaries that avoid revealing private data" — if two users' responses share content (session-specific tokens, per-user CSRF, per-tenant data in the same URL class), auto-dictionary generation could leak private data across responses or across users. A measurement-on-real-users loop that watches for unexpected response-size changes, client errors, or content-integrity failures catches these before scale-out.
  • Effectiveness: successive responses that look versioned by URL pattern may produce large inter-version diffs because content genuinely changed (real refactors, large dependency bumps). Compressing against the previous version could produce worse output than plain Brotli. The RUM beacon detects this and withdraws the candidate.

Sibling patterns

  • patterns/comparative-rum-benchmarking — same RUM- beacon-as-measurement-substrate pattern, but for comparing network-performance against competitors (Cloudflare vs peer CDN). Distinct application, same infrastructure.
  • Classical A/B testing with metric-based rollouts — RUM-validated dictionary selection is a specific instantiation for a CDN-protocol feature where the "metric" is compression ratio measured client-side.
  • patterns/alert-backtesting — feedback-loop pattern at a different domain (alerting rules) with similar shape (propose → measure → commit or drop).

Seen in

Last updated · 200 distilled / 1,178 read