CONCEPT Cited by 2 sources

Service topology¶

Disambiguation. The phrase "service topology" is used for two structurally different concepts on this wiki. This page covers the Cloudflare framing: a configuration abstraction for anycast / edge-routing — "at which POPs is this service's IP address allowed to be advertised?". Netflix's Service Topology (systems/netflix-service-topology) is a different concept — a real-time service-dependency graph derived from runtime traffic. See concepts/service-dependency-graph for the Netflix-aligned concept page.

Service topology is a configuration abstraction answering "at which locations is this service's IP address allowed to be reachable from the Internet?". For a global anycast edge, the default answer is "all POPs"; for a service constrained by data-residency or compliance requirements, the answer is an explicit subset of locations. The topology feeds the control plane that decides where to advertise the service's prefixes via BGP.

In Cloudflare's vocabulary (see Data Localization Suite), a service topology is a named set of POPs; "a particular service has a matching service topology — that is, traffic for a service should be routed only to a particular set of locations."

Why it's a safety-critical configuration surface¶

Service topology is the direct input to anycast advertisement behaviour. A bug in topology config is functionally equivalent to a bug that unadvertises the service:

Shrinking a topology from "all locations" to "one location" triggers a global BGP withdrawal from every non-listed POP.
Accidentally linking one service's prefixes to another service's topology inherits the second service's advertisement behaviour — which can produce no immediate impact at all if the second service is not yet live, and total outage later when the second service's topology is touched.

The 2025-06-06 configuration error in Cloudflare's 2025-07-14 1.1.1.1 incident was exactly this shape: the 1.1.1.1 Resolver's prefixes were inadvertently linked to the topology of a not-yet-live DLS service. No routing changed, no alerts fired, no user impact — until 38 days later when a separate test-location attachment to that same non-live topology triggered a global config refresh, which then evaluated the (accidentally-linked) 1.1.1.1 prefixes against the now-updated topology and withdrew them from the production fleet.

Why hard-coded IP lists are the worse version¶

Cloudflare's legacy topology model "hard-coded explicit lists of data center locations and attached them to particular prefixes". Two structural problems the blog calls out:

Maintenance quadratic in DCs × services. Bringing a new POP online requires updating many lists consistently; any miss produces an invisible availability gap.
No progressive deployment. Config changes go to every data center at once — peer-reviewed but not canaried. patterns/progressive-configuration-rollout is not supported by the legacy surface.

The strategic model "describe service topologies without needing to hard-code IP addresses, which better accommodate expansions to new locations and customer scenarios while also allowing for a staged deployment model, so changes can propagate slowly with health monitoring."

The mid-migration state — both systems live and mutually synchronized — is itself a risk surface; see patterns/dual-system-sync-during-migration.

Generalises¶

Any edge / CDN / DNS provider with regional-routing product capability has some version of this concept. Naming varies; the primitive is always "where is this service's IP allowed to be advertised?" plus "how are changes to that answer deployed?". Whatever form it takes, it should:

Be expressible without hard-coded IP lists (so new locations don't require sweeping edits).
Be deployable progressively, with health-signal gating and automated rollback.
Have explicit reverse-lookups ("which services currently point at topology X?") so config edits can't silently rope in unintended prefixes.

Seen in¶

sources/2025-07-16-cloudflare-1111-incident-on-july-14-2025 — canonical wiki instance; service topology is the configuration surface the 07-14 outage played out on. The remediation plan is about deprecating the legacy topology surface specifically.
sources/2026-05-29-netflix-from-silos-to-service-topology-why-netflix-built-a-real-time-service-map — homonym, not synonym: Netflix's Service Topology is a real-time service-dependency graph, not an anycast-routing configuration abstraction. Cross-referenced for disambiguation only. The conceptual entry point for the Netflix framing is concepts/service-dependency-graph and the system page is systems/netflix-service-topology.

concepts/service-dependency-graph — the other "service topology" concept (Netflix framing)
systems/netflix-service-topology — the Netflix system
concepts/anycast
concepts/bgp-route-withdrawal
concepts/latent-misconfiguration
systems/cloudflare-data-localization-suite
systems/cloudflare-1-1-1-1-resolver
patterns/progressive-configuration-rollout
patterns/dual-system-sync-during-migration

Service topology¶

Why it's a safety-critical configuration surface¶

Why hard-coded IP lists are the worse version¶

Generalises¶

Seen in¶

Related¶