PATTERN Cited by 1 source
Separate routing from model selection¶
Separate routing from model selection is the architectural move of splitting the Objective → model decision (model selection) from the model → cluster VIP decision (routing), and executing them in different services with different latency profiles.
- Model selection needs research-facing context: A/B cell allocation, Switchboard rule evaluation, ObjectiveConfig derivation. Relatively expensive, infrequent computational concern; produces small metadata.
- Routing needs low-overhead header-to-VIP mapping. Per request, payload-scale. Mature off-the-shelf proxies (Envoy) do it natively.
Canonicalised on the wiki by the 2026-05-01 Netflix Model-Serving post, where Netflix evolves from "Switchboard owns both" (in-path gRPC proxy) to "Lightbulb owns model selection, Envoy owns routing" (metadata resolver + data-plane proxy).
The split¶
The pattern's load-bearing moves:
- Lift model selection out of the payload path. A
metadata-only service (Lightbulb)
takes minimal request context, resolves the Objective → model,
attaches a
routingKey(headers) +ObjectiveConfig(body). - Let the data-plane proxy do routing. The client assembles
the real payload with the headers+config and hits the backend
via an already-deployed service-mesh proxy (Envoy in Netflix's
case; Istio / AWS App Mesh equivalents exist). The proxy maps
routingKey→ cluster VIP from its routing-rules config. - Keep the researcher-facing config contract. Switchboard Rules (JavaScript → JSON via pub/sub) survive as the rule-authoring surface. Both Lightbulb and the proxy's control plane consume the same rules.
Why it works¶
Three named benefits from the 2026-05-01 post:
- Remove the routing service from the direct request path. Eliminates the concepts/serialization-tax-in-proxy-path and the routing-proxy single-point-of-failure.
- Separate model inputs from request metadata. Large payloads (feature vectors, title lists, transaction objects) don't need deserialize+re-serialize at the routing layer; they pass through the data-plane proxy untouched.
- Provide better isolation for the routing layer. Lightbulb's metadata work can be sharded per tenant; Envoy provides natural cluster-level isolation. Fixes concepts/tenant-isolation-in-routing-layer.
ASCII architecture¶
BEFORE (Switchboard):
Client --payload--> Switchboard (deserialize/re-serialize) --payload--> Cluster VIP N
^ deserialize tax ^
^ SPOF ^
^ shared tenant ^
AFTER (Lightbulb + Envoy):
Client --context-only--> Lightbulb --(routingKey, ObjectiveConfig)-->
Client --payload + headers--> Envoy (route on routingKey header) --payload--> Cluster VIP N
^ payload unparsed
^ per-tenant routing rules
^ already deployed
When this pattern applies¶
- ML-serving platforms with researcher-driven A/B cadence where the model-selection logic has to evolve frequently (rule changes per experiment, per model candidate), but the underlying routing topology (VIPs, cluster shards) is stable day-to-day.
- Platforms with large payloads (tensor inputs, long feature vectors, rich domain objects) where the serialization tax is material.
- Multi-tenant serving platforms where multiple use cases share routing infrastructure and need to be isolated without deploying N independent proxies.
- Platforms with an existing sidecar / egress proxy already deployed (Envoy, Istio, AWS App Mesh, ECS Service Connect). Adding a Lightbulb-class metadata resolver is cheaper than building and operating a new in-path proxy fleet.
When this pattern does not apply¶
- Small payloads + light routing logic. If payload is small (headers-only, tiny JSON) and routing logic is simple (hash, round-robin), an in-path proxy has negligible tax. Keep it simple.
- Research config is the hot spot. If the rule-authoring velocity isn't high and model selection is simple, Lightbulb's complexity doesn't pay off.
- No pre-existing service-mesh proxy. If you don't already have an Envoy-class data plane at every hop, you'd be introducing it to use this pattern — a bigger lift than the Switchboard-style in-path proxy.
Substrate + substrate requirements¶
- Data plane proxy capable of header-based routing with dynamic rules (Envoy xDS, Istio Envoy sidecars, AWS ECS Service Connect). See systems/envoy.
- Pub/sub substrate for rule publication, versioning, and dynamic loading. See patterns/config-separated-from-code-via-pubsub + systems/netflix-gutenberg.
- Experimentation platform for per-user A/B cell resolution. Netflix's is internal; analogues include LaunchDarkly, Statsig, Split.
- Logical use-case identifier (Objective or equivalent) that scopes rules without pinning concrete models.
Failure modes¶
- Lightbulb becomes the new SPOF. If Lightbulb is down,
clients can't get
routingKeys. Need fallback (staleroutingKeyfrom last response, or default Objective → default model). The 2026-05-01 post names "fallback and client-side caching in case of failures" as a design goal retained from Switchboard. - Envoy routing rules go stale. If the rule control plane is slow, new models or migrated VIPs don't reach the data plane promptly. Same race-condition risk Switchboard had; now at the proxy control-plane.
- Metadata/payload mismatch. If Lightbulb attaches an
ObjectiveConfigthat doesn't match the payload (e.g. stale model ID for a payload schema that has moved on), serving hosts get inconsistent requests. The dual-subscriber discipline (both Lightbulb and serving hosts subscribe to the same rule stream) is meant to prevent this.
Related patterns on the wiki¶
- patterns/centralized-routing-proxy-for-ml-serving — the pattern this one supersedes (Switchboard shape).
- patterns/config-separated-from-code-via-pubsub — co-pattern; both work together in Netflix's architecture.
- patterns/control-plane-data-plane-separation — the generic framing. Separate-routing-from-model-selection is the ML-serving specialisation.
- patterns/ai-gateway-provider-abstraction — sibling in the AI/LLM gateway space; different altitude (provider selection vs model selection) but same spirit.
Seen in¶
- sources/2026-05-01-netflix-state-of-routing-in-model-serving — first canonical wiki instance. Netflix explicitly names the split: "we now take the rules for an Objective and break them into distinct sets of configuration: Model Serving Configuration [which model to use] [and] Routing Rules [which VIP the request should be routed to]." Load-bearing for the 1M req/sec platform's evolution away from in-path proxying.
Related¶
- patterns/centralized-routing-proxy-for-ml-serving
- patterns/config-separated-from-code-via-pubsub
- patterns/control-plane-data-plane-separation
- concepts/objective-abstraction
- concepts/serialization-tax-in-proxy-path
- concepts/tenant-isolation-in-routing-layer
- concepts/vip-address-decoupling
- systems/netflix-switchboard
- systems/netflix-lightbulb
- systems/envoy
- companies/netflix