CONCEPT Cited by 1 source

VIP address decoupling¶

VIP address decoupling is the property that clients address a logical use case (e.g. Objective) rather than a concrete cluster VIP (Virtual IP). The backend mapping of logical use case → physical cluster VIP changes frequently — as traffic shifts, SLAs evolve, model architectures change, or CPU/memory availability moves models between clusters — and those VIP changes are absorbed by the routing layer, not propagated to clients.

Canonicalised on the wiki by the 2026-05-01 Netflix Model-Serving post, where "clients shielded from VIP address changes while ensuring high availability" is named as one of the three load-bearing platform principles.

Why VIPs churn in a model-serving platform¶

From the 2026-05-01 post:

"Models are distributed across multiple serving compute cluster shards, each with its own Virtual IP (VIP) Address. Various factors, such as traffic patterns, SLAs, model architecture, and CPU/Memory availability, affect model-to-cluster mapping, and changes to this mapping result in changes to the VIP address at which a model is reachable."

Concrete drivers of VIP changes for a given model:

Traffic growth or decline. A model gaining traffic may be moved to a larger cluster with a different VIP.
SLA-driven isolation. A latency-sensitive model may be pulled to a dedicated cluster.
Architecture fit. A new CPU- or GPU-bound model family needs different hardware, which lives on a different VIP.
Capacity balancing. CPU/memory pressure causes shard rebalances, moving the model to another cluster.

Without decoupling, every VIP change requires coordinated redeploy of every client service — impractical at 30+ clients.

How Netflix implements it¶

The 2026-05-01 post describes two implementations:

Switchboard (pre-2026). Clients hit Switchboard with an Objective + context; Switchboard resolves the Objective to a concrete model, looks up the model's current VIP in its model→shard assignment table, and proxies the request. Clients never see the VIP. The table is built by the control plane and refreshed as mappings change.
Lightbulb + Envoy (2026 onward). Clients call Lightbulb (minimal context in), receive a routingKey header + ObjectiveConfig body. They send the request through Envoy with the headers set. Envoy consumes the routingKey and maps it to the current cluster VIP via routing rules from the control plane. Clients still never see the VIP.

Both implementations preserve decoupling; the difference is that Lightbulb+Envoy removes the proxy from the payload's critical path. See patterns/separate-routing-from-model-selection.

Generalised role on the wiki¶

VIP address decoupling is structurally similar to several existing wiki concepts:

concepts/latency-based-dns-routing — DNS-level decoupling of a service name from a specific IP pool; VIP decoupling is the tighter-loop version where the routing decision is per-request, not per-DNS-TTL.
concepts/rib-fib-routing — at the network layer the equivalent decoupling is RIB (routing info base) → FIB (forwarding info base) — logical routes resolved to physical next hops by a routing daemon.
concepts/proxy-transparent-failover — the failure-case payoff: when a VIP goes down, the routing layer updates the mapping transparently.
Envoy's xDS architecture — exactly the control-plane contract that enables per-request VIP decoupling; see concepts/xds-protocol.

When it matters most¶

High VIP churn. Platforms where model-to-cluster or service-to-shard mappings change daily/hourly (model retraining, capacity rebalances, A/B promotions).
Many clients. 30+ clients at Netflix means 30+ redeploys per VIP change without decoupling.
Critical-path VIP lookups. If VIPs are resolved at every request, the routing service needs low overhead (Envoy with header-based mapping) or an in-path proxy (Switchboard) — which runs into concepts/serialization-tax-in-proxy-path at scale.

Seen in¶

sources/2026-05-01-netflix-state-of-routing-in-model-serving — first canonical wiki articulation of VIP address decoupling as a model-serving platform principle. The driver list (traffic, SLAs, model architecture, CPU/memory) enumerated; client opacity named as the payoff.

concepts/objective-abstraction — the addressing primitive on top of VIP decoupling
systems/netflix-switchboard
systems/netflix-lightbulb
systems/netflix-model-serving-platform
systems/envoy
concepts/latency-based-dns-routing
concepts/rib-fib-routing
concepts/proxy-transparent-failover
concepts/xds-protocol
patterns/separate-routing-from-model-selection