CONCEPT Cited by 1 source
Connection failure rate¶
Connection failure rate is a metric measuring how often a reverse proxy or CDN edge fails to establish a successful connection to an origin server when attempting to retrieve uncacheable content or content not in (or expired from) cache. It is a third-party uptime signal for whichever system hosts the origin — web site, cloud region, on-prem data center — derived from real proxy traffic rather than from synthetic probes.
The canonical definition is from the Cloudflare 2026-04-28 Q1 2026 disruption review:
"Connection failures occur when Cloudflare fails to successfully connect to an origin server when attempting to retrieve uncacheable content, or content not in/expired from cache."
Why cache-miss paths are the right measurement surface¶
A reverse proxy serves most requests from cache. Cache hits never touch the origin and therefore don't carry origin-health signal. The connection failure rate is measured on the subset of traffic that must go to the origin:
- Uncacheable content (
Cache-Control: no-store,Cache-Control: private, POST/PUT/DELETE requests). - Content not in cache (cold / low-popularity / new).
- Content whose cache entry has expired (
max-ageelapsed, revalidation needed).
For these requests, the edge opens a fresh TCP + TLS connection to the origin. Each successful or failed connection attempt contributes to the rate.
The third-party uptime role¶
Cloud providers publish their own status dashboards, but those reflect the provider's internal view of whether services are operational. The connection-failure-rate signal reflects the external-caller's view of whether the provider's origin is reachable:
- The provider's own health checks may pass while callers timeout — network-path failure the provider doesn't see.
- The provider's dashboard may lag the real failure (internal aggregation + editorial review delays vs live traffic drop).
- The provider's dashboard may omit failures it considers customer-configuration-related even when they are symptoms of a platform issue.
During the March 2026 drone strikes on AWS Middle East facilities (the canonical case in the sysdesign-wiki corpus), Cloudflare Cloud Observatory showed "elevated connection failure rates" for me-central-1 and me-south-1 "beginning March 1-2 and remaining higher for multiple days." This was independent, live, time-pinnable evidence of the regional degradation — a neutral evidence base that didn't depend on Amazon's own disclosure cadence.
What it can and cannot detect¶
It can detect:
- Origin unreachability from the reverse proxy's network location (all proxy POPs, or a subset).
- Region-wide capacity or network failures — not just service-level outages.
- Path degradations visible from the proxy's vantage point even when the origin thinks it's fine.
It cannot detect:
- Service-level failures that return fast HTTP 5xx (those arrive as completed connections with bad payloads, not as connection failures).
- Origin issues that only affect cached content (cache hits don't re-exercise the path).
- Silent corruption where connections succeed but the content is wrong.
For those, pair with origin HTTP status-code distributions and end-to-end response-correctness signals.
Vantage-point considerations¶
A connection-failure-rate metric is always observer-relative:
- Cloudflare's edge sees failures from its POP network; its view can differ from the view of clients using other networks (e.g., clients on ISPs that don't peer with Cloudflare).
- Per-origin-region granularity is meaningful because cloud regions are physically distinct environments; per-service or per-tenant granularity is often a better fit for service-level triage.
- Baseline variance differs by region — regions with low-popularity workloads have more cache-miss traffic, and therefore more origin-exercising measurements per unit time, than regions saturated with cacheable CDN traffic.
Seen in¶
- sources/2026-04-28-cloudflare-q1-2026-internet-disruption-summary
— canonical wiki instance. The review provides the verbatim
definition and uses connection failure rate on
me-central-1 +
me-south-1 as third-party
kinetic-attack forensics, with time-pinned URL evidence
(
?dateStart=2026-02-28&dateEnd=2026-03-06#connection-metrics) during the March 1-2 drone strikes on Amazon Web Services Middle East data centers.