PATTERN Cited by 1 source

Two-level regional/global state¶

Problem¶

A single global state-distribution cluster covering every region at once has two failure properties that get worse with scale:

Unbounded blast radius — a bug in the producer, consumer, or schema propagates to every region simultaneously.
Cross-region coupling — per-region changes trigger fleet-wide reconciliation even though they only matter locally.

At Fly.io, the 2024-09-01 global Anycast outage (contagious deadlock in fly-proxy) was the forcing function: a single bad update reached every proxy everywhere in milliseconds.

Pattern¶

Split the state into two tiers:

Per-region clusters — each region runs its own state-distribution cluster with fine-grained data about the local fleet. This is the high-cardinality / high-churn tier.
Global cluster — a smaller cluster that only carries cross-region-required coarse-grained data. For Fly.io's Anycast edge: "which regions run this app?" is enough to make edge forwarding decisions.

Consumers read their regional cluster for local details and the global cluster for cross-region routing.

Why it works¶

Blast radius bounded to a region for any bug in region-local code or data.
Cross-region traffic shrinks — most state changes stay in their region, reducing wire volume on inter-region links.
Rollouts are region-scoped — deploy a change to one region at a time; worst-case is one region down.
No coupling between regions — an incident in Tokyo doesn't touch Sydney's replica clusters.

The trade-off is coordination complexity between the two tiers: the regional clusters must periodically publish coarse-grained state to the global cluster, and the global cluster's schema becomes a cross-region contract.

Canonical wiki instance — Fly.io regionalization¶

From sources/2025-10-22-flyio-corrosion:

"After the contagious deadlock bug, we concluded we need to evolve past a single cluster. So we took on a project we call 'regionalization', which creates a two-level database scheme. Each region we operate in runs a Corrosion cluster with fine-grained data about every Fly Machine in the region. The global cluster then maps applications to regions, which is sufficient to make forwarding decisions at our edge proxies."

Crucial detail: "Nothing about Corrosion's design required us to [run a single global cluster]." The single domain was an operational default, not a protocol requirement — regionalization is entirely a deployment-shape change, not a new system.

Caveats¶

Schema discipline across tiers — any field that might ever be needed cross-region must be routed through the global tier; getting this wrong leaks regional state into the global cluster or forces costly migrations later.
Consistency seam — the regional-to-global publish path is eventually consistent. Systems depending on strict cross-region invariants need additional coordination.
Operational complexity — more clusters to monitor, upgrade, patch.
Routing / discovery — consumers need to know which regional cluster to consult; this adds a layer to the discovery mechanism.

Generalisation¶

The pattern applies broadly beyond Fly.io:

CDNs — per-PoP config stores with a global app → PoP mapping.
Service meshes — regional xDS control planes federated through a global gateway.
Feature flags / config — regional stores for high-churn flags, global store for cross-region-sensitive ones.
Observability backends — regional trace / metric stores with cross-region query federation.

Seen in¶

sources/2025-10-22-flyio-corrosion — canonical primary source.

concepts/regionalization-blast-radius-reduction — the concept page.
concepts/blast-radius — the concept this pattern operationalizes.
concepts/contagious-deadlock — the failure mode that motivated Fly.io's regionalization.
systems/corrosion-swim — the system being regionalized.
patterns/crdt-over-raft-for-wan-state-distribution — sibling pattern; both are responses to the WAN-distributed-state problem.