Skip to content

CONCEPT Cited by 1 source

Service topology

Service topology is a configuration abstraction answering "at which locations is this service's IP address allowed to be reachable from the Internet?". For a global anycast edge, the default answer is "all POPs"; for a service constrained by data-residency or compliance requirements, the answer is an explicit subset of locations. The topology feeds the control plane that decides where to advertise the service's prefixes via BGP.

In Cloudflare's vocabulary (see Data Localization Suite), a service topology is a named set of POPs; "a particular service has a matching service topology — that is, traffic for a service should be routed only to a particular set of locations."

Why it's a safety-critical configuration surface

Service topology is the direct input to anycast advertisement behaviour. A bug in topology config is functionally equivalent to a bug that unadvertises the service:

  • Shrinking a topology from "all locations" to "one location" triggers a global BGP withdrawal from every non-listed POP.
  • Accidentally linking one service's prefixes to another service's topology inherits the second service's advertisement behaviour — which can produce no immediate impact at all if the second service is not yet live, and total outage later when the second service's topology is touched.

The 2025-06-06 configuration error in Cloudflare's 2025-07-14 1.1.1.1 incident was exactly this shape: the 1.1.1.1 Resolver's prefixes were inadvertently linked to the topology of a not-yet-live DLS service. No routing changed, no alerts fired, no user impact — until 38 days later when a separate test-location attachment to that same non-live topology triggered a global config refresh, which then evaluated the (accidentally-linked) 1.1.1.1 prefixes against the now-updated topology and withdrew them from the production fleet.

Why hard-coded IP lists are the worse version

Cloudflare's legacy topology model "hard-coded explicit lists of data center locations and attached them to particular prefixes". Two structural problems the blog calls out:

  • Maintenance quadratic in DCs × services. Bringing a new POP online requires updating many lists consistently; any miss produces an invisible availability gap.
  • No progressive deployment. Config changes go to every data center at once — peer-reviewed but not canaried. patterns/progressive-configuration-rollout is not supported by the legacy surface.

The strategic model "describe service topologies without needing to hard-code IP addresses, which better accommodate expansions to new locations and customer scenarios while also allowing for a staged deployment model, so changes can propagate slowly with health monitoring."

The mid-migration state — both systems live and mutually synchronized — is itself a risk surface; see patterns/dual-system-sync-during-migration.

Generalises

Any edge / CDN / DNS provider with regional-routing product capability has some version of this concept. Naming varies; the primitive is always "where is this service's IP allowed to be advertised?" plus "how are changes to that answer deployed?". Whatever form it takes, it should:

  • Be expressible without hard-coded IP lists (so new locations don't require sweeping edits).
  • Be deployable progressively, with health-signal gating and automated rollback.
  • Have explicit reverse-lookups ("which services currently point at topology X?") so config edits can't silently rope in unintended prefixes.

Seen in

Last updated · 200 distilled / 1,178 read