Skip to content

PATTERN Cited by 2 sources

Zone-affinity routing (with spillover)

Zone-affinity routing prefers in-zone / in-region backends over cross-zone ones for latency and cost. With spillover (the version that actually works at scale): fall back to remote zones when the local zone is under-provisioned, unhealthy, or overloaded — so a preference doesn't turn into a local brownout when local capacity is insufficient.

Two economic / performance levers drive this:

  • Cross-AZ latency: ~1–3 ms extra per hop on typical cloud networks.
  • Cross-AZ data-transfer cost: non-trivial at Databricks-scale gRPC traffic (billed per GB).

Core shape

  1. Client discovers endpoints with zone labels (see concepts/xds-protocol — xDS carries locality metadata that DNS cannot).
  2. LB algorithm sorts endpoints by zone: same-zone first, other-zones after.
  3. Zone health / capacity check:
  4. If the same-zone subset has enough healthy capacity → route P2C (or whatever base strategy) within same-zone.
  5. If the same-zone subset is under-provisioned or unhealthy → spill over: include remote-zone endpoints (possibly weighted lower) in the selection pool.
  6. Continuously re-evaluate as topology / health changes (EDS pushes updates).

Envoy implements this via locality-weighted load balancing and priority levels; gRPC's xDS implementation has similar primitives. Client-side libraries usually implement a simpler version directly.

Why spillover is mandatory

Without it, zone-affinity turns into a partition amplifier: if one zone is short on replicas (e.g., after an AZ event), all local clients in that zone keep pounding the few remaining local pods, while well-provisioned remote zones sit idle. Spillover converts zone-affinity from a hard constraint into a preference: prefer local when the local pool is healthy, otherwise balance across zones.

Design considerations for the spillover policy:

  • Threshold. At what "not enough local capacity" signal do you spill? Request-error rate, saturation of local pool, missing zones, etc.
  • Partial vs. total. Spill only the fraction of load local can't serve, vs. send everything cross-zone until local recovers.
  • Hysteresis. Avoid flapping between local and cross-zone routing under borderline conditions.

Seen in

  • sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — Databricks notes zone-affinity is "vital for minimizing cross-zone network hops, which can significantly reduce network latency and associated data transfer costs, especially in geographically distributed Kubernetes clusters." They explicitly implement spillover: "the routing algorithm intelligently spills traffic over to other healthy zones, balancing load while still preferring local affinity whenever possible. This ensures high availability and consistent performance, even under uneven capacity distribution across zones."
  • sources/2026-04-21-figma-figcache-next-generation-data-caching-platformFigma FigCache applies the pattern to a caching proxy tier. Routing-level configuration probabilistically prefers zonal traffic colocation across three hops — client service → FigCache load balancer → FigCache service instance. The latency penalty of a cross-AZ hop is named ("as much as a few milliseconds"); probabilistic zonal preference makes this penalty "much more stable and predictable" rather than eliminating it. Distinctive: zonal colocation here is latency-control on the proxy hop, not just the client-to-service hop. Sibling application of the same pattern at a different architectural layer.
Last updated · 200 distilled / 1,178 read