PATTERN Cited by 1 source
Two-level regional/global state¶
Problem¶
A single global state-distribution cluster covering every region at once has two failure properties that get worse with scale:
- Unbounded blast radius — a bug in the producer, consumer, or schema propagates to every region simultaneously.
- Cross-region coupling — per-region changes trigger fleet-wide reconciliation even though they only matter locally.
At Fly.io, the 2024-09-01 global Anycast outage (contagious deadlock in fly-proxy) was the forcing function: a single bad update reached every proxy everywhere in milliseconds.
Pattern¶
Split the state into two tiers:
- Per-region clusters — each region runs its own state-distribution cluster with fine-grained data about the local fleet. This is the high-cardinality / high-churn tier.
- Global cluster — a smaller cluster that only carries cross-region-required coarse-grained data. For Fly.io's Anycast edge: "which regions run this app?" is enough to make edge forwarding decisions.
Consumers read their regional cluster for local details and the global cluster for cross-region routing.
Why it works¶
- Blast radius bounded to a region for any bug in region-local code or data.
- Cross-region traffic shrinks — most state changes stay in their region, reducing wire volume on inter-region links.
- Rollouts are region-scoped — deploy a change to one region at a time; worst-case is one region down.
- No coupling between regions — an incident in Tokyo doesn't touch Sydney's replica clusters.
The trade-off is coordination complexity between the two tiers: the regional clusters must periodically publish coarse-grained state to the global cluster, and the global cluster's schema becomes a cross-region contract.
Canonical wiki instance — Fly.io regionalization¶
From sources/2025-10-22-flyio-corrosion:
"After the contagious deadlock bug, we concluded we need to evolve past a single cluster. So we took on a project we call 'regionalization', which creates a two-level database scheme. Each region we operate in runs a Corrosion cluster with fine-grained data about every Fly Machine in the region. The global cluster then maps applications to regions, which is sufficient to make forwarding decisions at our edge proxies."
Crucial detail: "Nothing about Corrosion's design required us to [run a single global cluster]." The single domain was an operational default, not a protocol requirement — regionalization is entirely a deployment-shape change, not a new system.
Caveats¶
- Schema discipline across tiers — any field that might ever be needed cross-region must be routed through the global tier; getting this wrong leaks regional state into the global cluster or forces costly migrations later.
- Consistency seam — the regional-to-global publish path is eventually consistent. Systems depending on strict cross-region invariants need additional coordination.
- Operational complexity — more clusters to monitor, upgrade, patch.
- Routing / discovery — consumers need to know which regional cluster to consult; this adds a layer to the discovery mechanism.
Generalisation¶
The pattern applies broadly beyond Fly.io:
- CDNs — per-PoP config stores with a global app → PoP mapping.
- Service meshes — regional xDS control planes federated through a global gateway.
- Feature flags / config — regional stores for high-churn flags, global store for cross-region-sensitive ones.
- Observability backends — regional trace / metric stores with cross-region query federation.
Seen in¶
- sources/2025-10-22-flyio-corrosion — canonical primary source.
Related¶
- concepts/regionalization-blast-radius-reduction — the concept page.
- concepts/blast-radius — the concept this pattern operationalizes.
- concepts/contagious-deadlock — the failure mode that motivated Fly.io's regionalization.
- systems/corrosion-swim — the system being regionalized.
- patterns/crdt-over-raft-for-wan-state-distribution — sibling pattern; both are responses to the WAN-distributed-state problem.