CONCEPT Cited by 2 sources
Global configuration system¶
A fleet-wide configuration-delivery channel where a single edit propagates to every server in the fleet within seconds, with no canary, no staged rollout, no per-POP health gating.
Cloudflare uses the term explicitly in its 2025-12-05 post-mortem and implicitly in 2025-11-18 (the Bot Management feature-file distribution queue has the same structural property).
Why it exists¶
Rapid threat response requires the ability to push configuration changes to the whole fleet in seconds — DDoS mitigations, malicious-IP blacklists, bot signatures, WAF rules for zero-day CVEs. A canary rollout that takes hours defeats the point of threat response.
The trade-off: rapid reach = rapid blast radius. One bad push reaches the entire fleet just as fast as one good push.
Hazard profile¶
A global configuration system is a single surface where any of the following produces a fleet-wide incident:
- Bad payload. Oversized / malformed / out-of-range values that downstream consumers cannot safely load.
- Latent dormant code. A value that exercises a code path never before triggered in production (see concepts/latent-misconfiguration).
- Dependency-graph surprise. A value whose downstream effect crosses module or team boundaries in unexpected ways.
Canonical Cloudflare instances¶
- 2025-11-18 (Bot Management feature file distribution queue) — feature file regenerated every 5 min, propagated fleet-wide; one oversized file → ~3 hours of core-traffic outage. See sources/2025-11-18-cloudflare-outage-on-november-18-2025.
- 2025-12-05 (named explicitly in the 12-05 post-mortem as "the global configuration system, which does not perform gradual rollouts") — a single flag-flip hit a 7-year-old dormant Lua bug. See sources/2025-12-05-cloudflare-outage-on-december-5-2025.
- 2025-07-14 — sibling-but-distinct surface (service topology / addressing), same structural hazard: sources/2025-07-16-cloudflare-1111-incident-on-july-14-2025.
The 12-05 post states the global configuration system was "under review following the outage we experienced on November 18" — review was in progress but not complete when the same system delivered the 12-05 trigger.
Remediation stance¶
The structural fix is not to slow down threat-response delivery — that defeats the point. It is:
- Progressive configuration rollout with quick rollback — canary + health-gated stages + automated revert, applied to config-plane the same way code-plane has it.
- Global feature killswitch — an orthogonal fast-off path so a feature consuming bad config can be disabled in seconds without waiting to clean up the config.
- Ingest hardening — treat internally-generated config as untrusted input; validate before loading. See patterns/harden-ingestion-of-internal-config.
Seen in¶
- sources/2025-11-18-cloudflare-outage-on-november-18-2025
- sources/2025-12-05-cloudflare-outage-on-december-5-2025
Related¶
- patterns/global-configuration-push — the antipattern framing.
- patterns/progressive-configuration-rollout — the missing discipline.
- patterns/global-feature-killswitch — orthogonal fast-off lever.
- concepts/blast-radius
- concepts/latent-misconfiguration