Skip to content

PATTERN Cited by 1 source

Separate routing from model selection

Separate routing from model selection is the architectural move of splitting the Objective → model decision (model selection) from the model → cluster VIP decision (routing), and executing them in different services with different latency profiles.

  • Model selection needs research-facing context: A/B cell allocation, Switchboard rule evaluation, ObjectiveConfig derivation. Relatively expensive, infrequent computational concern; produces small metadata.
  • Routing needs low-overhead header-to-VIP mapping. Per request, payload-scale. Mature off-the-shelf proxies (Envoy) do it natively.

Canonicalised on the wiki by the 2026-05-01 Netflix Model-Serving post, where Netflix evolves from "Switchboard owns both" (in-path gRPC proxy) to "Lightbulb owns model selection, Envoy owns routing" (metadata resolver + data-plane proxy).

The split

The pattern's load-bearing moves:

  1. Lift model selection out of the payload path. A metadata-only service (Lightbulb) takes minimal request context, resolves the Objective → model, attaches a routingKey (headers) + ObjectiveConfig (body).
  2. Let the data-plane proxy do routing. The client assembles the real payload with the headers+config and hits the backend via an already-deployed service-mesh proxy (Envoy in Netflix's case; Istio / AWS App Mesh equivalents exist). The proxy maps routingKey → cluster VIP from its routing-rules config.
  3. Keep the researcher-facing config contract. Switchboard Rules (JavaScript → JSON via pub/sub) survive as the rule-authoring surface. Both Lightbulb and the proxy's control plane consume the same rules.

Why it works

Three named benefits from the 2026-05-01 post:

  • Remove the routing service from the direct request path. Eliminates the concepts/serialization-tax-in-proxy-path and the routing-proxy single-point-of-failure.
  • Separate model inputs from request metadata. Large payloads (feature vectors, title lists, transaction objects) don't need deserialize+re-serialize at the routing layer; they pass through the data-plane proxy untouched.
  • Provide better isolation for the routing layer. Lightbulb's metadata work can be sharded per tenant; Envoy provides natural cluster-level isolation. Fixes concepts/tenant-isolation-in-routing-layer.

ASCII architecture

BEFORE (Switchboard):
  Client --payload--> Switchboard (deserialize/re-serialize) --payload--> Cluster VIP N
                     ^ deserialize tax ^
                     ^ SPOF            ^
                     ^ shared tenant   ^

AFTER (Lightbulb + Envoy):
  Client --context-only--> Lightbulb --(routingKey, ObjectiveConfig)-->
  Client --payload + headers--> Envoy (route on routingKey header) --payload--> Cluster VIP N
                                      ^ payload unparsed
                                      ^ per-tenant routing rules
                                      ^ already deployed

When this pattern applies

  • ML-serving platforms with researcher-driven A/B cadence where the model-selection logic has to evolve frequently (rule changes per experiment, per model candidate), but the underlying routing topology (VIPs, cluster shards) is stable day-to-day.
  • Platforms with large payloads (tensor inputs, long feature vectors, rich domain objects) where the serialization tax is material.
  • Multi-tenant serving platforms where multiple use cases share routing infrastructure and need to be isolated without deploying N independent proxies.
  • Platforms with an existing sidecar / egress proxy already deployed (Envoy, Istio, AWS App Mesh, ECS Service Connect). Adding a Lightbulb-class metadata resolver is cheaper than building and operating a new in-path proxy fleet.

When this pattern does not apply

  • Small payloads + light routing logic. If payload is small (headers-only, tiny JSON) and routing logic is simple (hash, round-robin), an in-path proxy has negligible tax. Keep it simple.
  • Research config is the hot spot. If the rule-authoring velocity isn't high and model selection is simple, Lightbulb's complexity doesn't pay off.
  • No pre-existing service-mesh proxy. If you don't already have an Envoy-class data plane at every hop, you'd be introducing it to use this pattern — a bigger lift than the Switchboard-style in-path proxy.

Substrate + substrate requirements

  • Data plane proxy capable of header-based routing with dynamic rules (Envoy xDS, Istio Envoy sidecars, AWS ECS Service Connect). See systems/envoy.
  • Pub/sub substrate for rule publication, versioning, and dynamic loading. See patterns/config-separated-from-code-via-pubsub + systems/netflix-gutenberg.
  • Experimentation platform for per-user A/B cell resolution. Netflix's is internal; analogues include LaunchDarkly, Statsig, Split.
  • Logical use-case identifier (Objective or equivalent) that scopes rules without pinning concrete models.

Failure modes

  • Lightbulb becomes the new SPOF. If Lightbulb is down, clients can't get routingKeys. Need fallback (stale routingKey from last response, or default Objective → default model). The 2026-05-01 post names "fallback and client-side caching in case of failures" as a design goal retained from Switchboard.
  • Envoy routing rules go stale. If the rule control plane is slow, new models or migrated VIPs don't reach the data plane promptly. Same race-condition risk Switchboard had; now at the proxy control-plane.
  • Metadata/payload mismatch. If Lightbulb attaches an ObjectiveConfig that doesn't match the payload (e.g. stale model ID for a payload schema that has moved on), serving hosts get inconsistent requests. The dual-subscriber discipline (both Lightbulb and serving hosts subscribe to the same rule stream) is meant to prevent this.

Seen in

  • sources/2026-05-01-netflix-state-of-routing-in-model-serving — first canonical wiki instance. Netflix explicitly names the split: "we now take the rules for an Objective and break them into distinct sets of configuration: Model Serving Configuration [which model to use] [and] Routing Rules [which VIP the request should be routed to]." Load-bearing for the 1M req/sec platform's evolution away from in-path proxying.
Last updated · 445 distilled / 1,275 read