PATTERN Cited by 1 source

Separate routing from model selection¶

Separate routing from model selection is the architectural move of splitting the Objective → model decision (model selection) from the model → cluster VIP decision (routing), and executing them in different services with different latency profiles.

Model selection needs research-facing context: A/B cell allocation, Switchboard rule evaluation, ObjectiveConfig derivation. Relatively expensive, infrequent computational concern; produces small metadata.
Routing needs low-overhead header-to-VIP mapping. Per request, payload-scale. Mature off-the-shelf proxies (Envoy) do it natively.

Canonicalised on the wiki by the 2026-05-01 Netflix Model-Serving post, where Netflix evolves from "Switchboard owns both" (in-path gRPC proxy) to "Lightbulb owns model selection, Envoy owns routing" (metadata resolver + data-plane proxy).

The split¶

The pattern's load-bearing moves:

Lift model selection out of the payload path. A metadata-only service (Lightbulb) takes minimal request context, resolves the Objective → model, attaches a routingKey (headers) + ObjectiveConfig (body).
Let the data-plane proxy do routing. The client assembles the real payload with the headers+config and hits the backend via an already-deployed service-mesh proxy (Envoy in Netflix's case; Istio / AWS App Mesh equivalents exist). The proxy maps routingKey → cluster VIP from its routing-rules config.
Keep the researcher-facing config contract. Switchboard Rules (JavaScript → JSON via pub/sub) survive as the rule-authoring surface. Both Lightbulb and the proxy's control plane consume the same rules.

Why it works¶

Three named benefits from the 2026-05-01 post:

Remove the routing service from the direct request path. Eliminates the concepts/serialization-tax-in-proxy-path and the routing-proxy single-point-of-failure.
Separate model inputs from request metadata. Large payloads (feature vectors, title lists, transaction objects) don't need deserialize+re-serialize at the routing layer; they pass through the data-plane proxy untouched.
Provide better isolation for the routing layer. Lightbulb's metadata work can be sharded per tenant; Envoy provides natural cluster-level isolation. Fixes concepts/tenant-isolation-in-routing-layer.

ASCII architecture¶

BEFORE (Switchboard):
  Client --payload--> Switchboard (deserialize/re-serialize) --payload--> Cluster VIP N
                     ^ deserialize tax ^
                     ^ SPOF            ^
                     ^ shared tenant   ^

AFTER (Lightbulb + Envoy):
  Client --context-only--> Lightbulb --(routingKey, ObjectiveConfig)-->
  Client --payload + headers--> Envoy (route on routingKey header) --payload--> Cluster VIP N
                                      ^ payload unparsed
                                      ^ per-tenant routing rules
                                      ^ already deployed

When this pattern applies¶

ML-serving platforms with researcher-driven A/B cadence where the model-selection logic has to evolve frequently (rule changes per experiment, per model candidate), but the underlying routing topology (VIPs, cluster shards) is stable day-to-day.
Platforms with large payloads (tensor inputs, long feature vectors, rich domain objects) where the serialization tax is material.
Multi-tenant serving platforms where multiple use cases share routing infrastructure and need to be isolated without deploying N independent proxies.
Platforms with an existing sidecar / egress proxy already deployed (Envoy, Istio, AWS App Mesh, ECS Service Connect). Adding a Lightbulb-class metadata resolver is cheaper than building and operating a new in-path proxy fleet.

When this pattern does not apply¶

Small payloads + light routing logic. If payload is small (headers-only, tiny JSON) and routing logic is simple (hash, round-robin), an in-path proxy has negligible tax. Keep it simple.
Research config is the hot spot. If the rule-authoring velocity isn't high and model selection is simple, Lightbulb's complexity doesn't pay off.
No pre-existing service-mesh proxy. If you don't already have an Envoy-class data plane at every hop, you'd be introducing it to use this pattern — a bigger lift than the Switchboard-style in-path proxy.

Substrate + substrate requirements¶

Data plane proxy capable of header-based routing with dynamic rules (Envoy xDS, Istio Envoy sidecars, AWS ECS Service Connect). See systems/envoy.
Pub/sub substrate for rule publication, versioning, and dynamic loading. See patterns/config-separated-from-code-via-pubsub + systems/netflix-gutenberg.
Experimentation platform for per-user A/B cell resolution. Netflix's is internal; analogues include LaunchDarkly, Statsig, Split.
Logical use-case identifier (Objective or equivalent) that scopes rules without pinning concrete models.

Failure modes¶

Lightbulb becomes the new SPOF. If Lightbulb is down, clients can't get routingKeys. Need fallback (stale routingKey from last response, or default Objective → default model). The 2026-05-01 post names "fallback and client-side caching in case of failures" as a design goal retained from Switchboard.
Envoy routing rules go stale. If the rule control plane is slow, new models or migrated VIPs don't reach the data plane promptly. Same race-condition risk Switchboard had; now at the proxy control-plane.
Metadata/payload mismatch. If Lightbulb attaches an ObjectiveConfig that doesn't match the payload (e.g. stale model ID for a payload schema that has moved on), serving hosts get inconsistent requests. The dual-subscriber discipline (both Lightbulb and serving hosts subscribe to the same rule stream) is meant to prevent this.

patterns/centralized-routing-proxy-for-ml-serving — the pattern this one supersedes (Switchboard shape).
patterns/config-separated-from-code-via-pubsub — co-pattern; both work together in Netflix's architecture.
patterns/control-plane-data-plane-separation — the generic framing. Separate-routing-from-model-selection is the ML-serving specialisation.
patterns/ai-gateway-provider-abstraction — sibling in the AI/LLM gateway space; different altitude (provider selection vs model selection) but same spirit.

Seen in¶

sources/2026-05-01-netflix-state-of-routing-in-model-serving — first canonical wiki instance. Netflix explicitly names the split: "we now take the rules for an Objective and break them into distinct sets of configuration: Model Serving Configuration [which model to use] [and] Routing Rules [which VIP the request should be routed to]." Load-bearing for the 1M req/sec platform's evolution away from in-path proxying.