PATTERN Cited by 1 source
RUM-validated dictionary selection¶
RUM-validated dictionary selection is the pattern in which a CDN auto-selects candidate compression dictionaries from observed traffic patterns, then validates that the dictionary actually improves compression by measuring real user compression outcomes via the RUM beacon before committing to serve it. The RUM beacon acts as a closed-loop feedback signal: only dictionaries whose served variants produce measured compression wins stay in rotation.
Framed in Cloudflare's 2026-04-17 shared-dictionaries launch post as the safety mechanism for Phase 3 automatic dictionaries — the end-state where no customer configuration is needed and Cloudflare's network auto-detects which resources are versioned, auto-generates dictionaries, and auto-serves delta-compressed responses.
The shape¶
1. Pattern detection
Observe URL classes across many sites, many requests,
many deploys. Identify candidates: successive responses
that "share most of their content but differ by hash" —
strong signal that the resource is versioned.
2. Dictionary generation
Store the previous version as a candidate dictionary.
(See [patterns/previous-version-as-dictionary](<./previous-version-as-dictionary.md>).)
3. Shadow serving / small-sample serving
Serve delta-compressed responses against the candidate
dictionary to a subset of eligible clients.
4. RUM beacon measurement
Browser RUM beacon reports the actual compression ratio
observed, the actual download time, the actual success /
failure rate. These are real-user numbers, not lab
simulations.
5. Validation gate
Only promote the dictionary to broad rotation if the RUM
beacon confirms a real compression lift. If the gate
fails (diff is large, compression ratio is mediocre,
browser-support edge cases proliferate), drop the
candidate and don't serve delta-compressed responses
against it.
6. Continuous re-validation
As customer content evolves, periodically re-validate
that the current dictionary is still producing wins, and
roll to a newer version when the older one degrades.
Why this particular loop matters¶
Three things come together:
- Traffic visibility — Cloudflare's network sees the traffic patterns across millions of sites and billions of requests per day. Pattern-detection input.
- Cache-layer co-location — Cloudflare already manages the cache layer where dictionaries need to live. Storage substrate is present.
- RUM beacon already deployed — the RUM beacon gives a validation loop that confirms a dictionary actually improves compression on real traffic before committing to serve it.
"The combination of traffic visibility, edge storage, and synthetic testing is what makes automatic generation feasible, though there are still many pieces to figure out." (Source: this article)
Why a validation loop is necessary¶
Auto-generating dictionaries from observed traffic is technically straightforward; what's hard is knowing the generated dictionary is safe + effective:
- Safety: "Safely generating dictionaries that avoid revealing private data" — if two users' responses share content (session-specific tokens, per-user CSRF, per-tenant data in the same URL class), auto-dictionary generation could leak private data across responses or across users. A measurement-on-real-users loop that watches for unexpected response-size changes, client errors, or content-integrity failures catches these before scale-out.
- Effectiveness: successive responses that look versioned by URL pattern may produce large inter-version diffs because content genuinely changed (real refactors, large dependency bumps). Compressing against the previous version could produce worse output than plain Brotli. The RUM beacon detects this and withdraws the candidate.
Sibling patterns¶
- patterns/comparative-rum-benchmarking — same RUM- beacon-as-measurement-substrate pattern, but for comparing network-performance against competitors (Cloudflare vs peer CDN). Distinct application, same infrastructure.
- Classical A/B testing with metric-based rollouts — RUM-validated dictionary selection is a specific instantiation for a CDN-protocol feature where the "metric" is compression ratio measured client-side.
- patterns/alert-backtesting — feedback-loop pattern at a different domain (alerting rules) with similar shape (propose → measure → commit or drop).
Related¶
- patterns/phased-cdn-rollout-passthrough-managed-auto — RUM-validated dictionary selection is the Phase 3 safety mechanism that makes automatic-without-customer- configuration deployment viable.
- concepts/real-user-measurement — the RUM beacon measurement primitive underneath.
Seen in¶
- sources/2026-04-17-cloudflare-shared-dictionaries-compression-that-keeps-up-with-the-agent — canonical 2026 instance. "Our RUM beacon to clients can help give us a validation loop to confirm that a dictionary actually improves compression before we commit to serving it."