SYSTEM Cited by 1 source

Netflix Switchboard¶

Switchboard is Netflix's pre-2026 custom gRPC routing proxy that sits in the critical path of every ML-inference request across Netflix's centralized Model Serving Platform. It is the mandatory interface — one integration point that all >30 client services use to reach whichever model is currently configured for their use case.

Switchboard handled 1 million requests per second at peak, and bundled four responsibilities: (1) Objective-addressable routing, (2) A/B-test-aware model selection (integrated with Netflix's Experimentation Platform), (3) shadow mode + canary + rollback lifecycle management, and (4) context-enrichment of model inputs. It was built because off-the-shelf proxies (AWS API Gateway, standalone service-mesh proxies) did not meet Netflix's specific needs — notably, first-class experimentation-platform integration, gRPC endpoint exposure, rich domain-context routing, and model-lifecycle awareness.

Status: 2026-05-01 post describes Switchboard's succession by the Lightbulb + Envoy split. Switchboard is not retired; its responsibilities are re-organised — Lightbulb takes over metadata resolution, Envoy takes over the actual connection routing, and Switchboard Rules as the config surface survive in a refined form.

Key capabilities (2026-05-01 post)¶

Common client abstraction. Single point of contact for

30 client services. ML-Ops benefits: central rate limits across model versions, central concurrency limits to absorb bad clients.
Context-aware routing. Routes based on user device, locale, ranking surface (home vs search), active A/B test cell, and other rich context features.
Dynamic traffic splitting. Canary deployments and experimentation in real time — new model version gets a small percentage of traffic before a full launch.
Model versioning + lifecycle. Concurrent traffic to multiple model versions enables:
Shadow mode — new model version gets real production traffic but its responses are discarded (used for performance comparison without UX impact).
Instant rollback — traffic moves away from a problematic version to a stable one atomically.

The Objective abstraction¶

Switchboard exposes a single API; every request carries an Objective — a platform-defined enumeration like ContinueWatchingRanking or PaymentFraudDetection. Clients never learn the concrete model ID. Switchboard uses the Objective + request context + user's A/B cell to select the model and route the request to the appropriate serving cluster shard's VIP.

Switchboard Rules (the config surface)¶

Researchers author traffic-routing rules as JavaScript configuration, compiled to JSON rule sets. Rules bind Objectives to models, A/B-test-cell → model maps, and gradual traffic shifts. Example from the post (abbreviated):

function defineAB12345Rule() {
  const abTestId = 12345;
  const objectives = Objectives.ContinueWatchingRanking;
  const abTestCellToModel = {
    1: {name: "netflix-continue-watching-model-default"},
    2: {name: "netflix-continue-watching-model-cell-2"},
    3: {name: "netflix-continue-watching-model-cell-3"}
  };
  return {
    cellToModel: abTestCellToModel,
    abTestId: abTestId,
    targetObjectives: [objectives],
    modelInputType: constants.TITLE_INPUT_TYPE,
    modelType: "SCORER"
  };
}

Rules are published via Gutenberg (Netflix's dataset pub/sub) and subscribed by both Switchboard and the serving cluster hosts — an independent release cycle for experiments, decoupled from platform code deploys. See patterns/config-separated-from-code-via-pubsub.

Control plane + data plane responsibilities¶

Control plane flow (Switchboard-era):

Assignment — rules produce a model-to-cluster-shard assignment.
Validation — all specified models are loaded into the serving cluster shard and their dependencies validated for successful execution.
Mapping — the model-to-shard VIP mapping is provided to Switchboard.

Data plane flow (Switchboard-era, per request):

Allocation — for an Objective like ContinueWatchingRanking, Switchboard queries the Experimentation Platform for the userId's A/B cell allocation.
Model selection — use the allocation + A/B test rule to pick the model.
Request routing — route to the serving cluster shard hosting the selected model, with context.
Model execution (on the serving host) — run the workflow steps and return the response.

Why Netflix built it (explicit build-vs-buy)¶

From the post:

"Standard out-of-the-box API Gateway solutions (such as AWS API Gateway, a standalone Service Mesh proxy) did not meet all our requirements. In particular, we needed first-class integration with Netflix's experimentation platform, the ability to expose gRPC endpoints to clients, and the ability to use rich domain-specific context for routing customizations, which generic proxies were not designed to handle. Furthermore, the platform required customizations to model-specific lifecycle stages (shadow mode, canaries, rollbacks) to enable safe rollouts and migrations."

Canonical example on the wiki of the tradeoff captured in patterns/centralized-routing-proxy-for-ml-serving.

Why Netflix outgrew it (three named pains)¶

The post names three load-bearing problems that motivated the Lightbulb + Envoy split:

Single point of failure. "Switchboard became a shared dependency whose failure would degrade or disable multiple ML-powered experiences at Netflix." With >30 client services depending on it, its blast radius was unacceptable.
Added latency due to additional network hop. "Switchboard in the request path adds between 10–20ms of latency due to serialization-deserialization operations, depending on payload size. Additionally, it further exposes a request to tail latency amplification." — canonicalised as concepts/serialization-tax-in-proxy-path. Unacceptable for latency-sensitive clients.
Reduced client flexibility. "Switchboard obscures visibility into client request origins from the serving clusters. Consequently, distinguishing data logged for real vs artificial traffic, which is essential for model training, is difficult and requires ongoing customization and increased MLOps overhead." — the concepts/tenant-isolation-in-routing-layer problem.

Succession: Lightbulb + Envoy¶

See systems/netflix-lightbulb. The short form:

Switchboard's in-path proxy role → Envoy (already used for all Netflix egress), which routes on routingKey headers with minimal overhead.
Switchboard's Objective → model resolution role → Lightbulb, a metadata-only resolver that produces routingKey + ObjectiveConfig, off the payload's critical path.
Switchboard Rules survive as the config surface — the rule grammar carries over, now consumed by Lightbulb and the serving hosts (not by an in-path proxy).

Seen in¶

sources/2026-05-01-netflix-state-of-routing-in-model-serving — canonical wiki disclosure of Switchboard's shape, capabilities, and the three pains that drove the Lightbulb architecture. 1M req/sec operating scale + >30 client services + 10–20ms serialization tax numbers disclosed here.

systems/netflix-lightbulb — metadata-only successor
systems/netflix-model-serving-platform — parent platform
systems/netflix-gutenberg — rule-publishing substrate
systems/envoy — data-plane successor for in-path routing
concepts/objective-abstraction
concepts/serialization-tax-in-proxy-path
concepts/tenant-isolation-in-routing-layer
patterns/centralized-routing-proxy-for-ml-serving
patterns/separate-routing-from-model-selection
patterns/config-separated-from-code-via-pubsub