SYSTEM Cited by 1 source
Netflix Switchboard¶
Switchboard is Netflix's pre-2026 custom gRPC routing proxy that sits in the critical path of every ML-inference request across Netflix's centralized Model Serving Platform. It is the mandatory interface — one integration point that all >30 client services use to reach whichever model is currently configured for their use case.
Switchboard handled 1 million requests per second at peak, and bundled four responsibilities: (1) Objective-addressable routing, (2) A/B-test-aware model selection (integrated with Netflix's Experimentation Platform), (3) shadow mode + canary + rollback lifecycle management, and (4) context-enrichment of model inputs. It was built because off-the-shelf proxies (AWS API Gateway, standalone service-mesh proxies) did not meet Netflix's specific needs — notably, first-class experimentation-platform integration, gRPC endpoint exposure, rich domain-context routing, and model-lifecycle awareness.
Status: 2026-05-01 post describes Switchboard's succession by the Lightbulb + Envoy split. Switchboard is not retired; its responsibilities are re-organised — Lightbulb takes over metadata resolution, Envoy takes over the actual connection routing, and Switchboard Rules as the config surface survive in a refined form.
Key capabilities (2026-05-01 post)¶
- Common client abstraction. Single point of contact for
30 client services. ML-Ops benefits: central rate limits across model versions, central concurrency limits to absorb bad clients.
- Context-aware routing. Routes based on user device, locale, ranking surface (home vs search), active A/B test cell, and other rich context features.
- Dynamic traffic splitting. Canary deployments and experimentation in real time — new model version gets a small percentage of traffic before a full launch.
- Model versioning + lifecycle. Concurrent traffic to multiple model versions enables:
- Shadow mode — new model version gets real production traffic but its responses are discarded (used for performance comparison without UX impact).
- Instant rollback — traffic moves away from a problematic version to a stable one atomically.
The Objective abstraction¶
Switchboard exposes a single API; every request carries an
Objective — a platform-defined
enumeration like ContinueWatchingRanking or
PaymentFraudDetection. Clients never learn the concrete model
ID. Switchboard uses the Objective + request context +
user's A/B cell to select the model and route the request to the
appropriate serving cluster shard's VIP.
Switchboard Rules (the config surface)¶
Researchers author traffic-routing rules as JavaScript configuration, compiled to JSON rule sets. Rules bind Objectives to models, A/B-test-cell → model maps, and gradual traffic shifts. Example from the post (abbreviated):
function defineAB12345Rule() {
const abTestId = 12345;
const objectives = Objectives.ContinueWatchingRanking;
const abTestCellToModel = {
1: {name: "netflix-continue-watching-model-default"},
2: {name: "netflix-continue-watching-model-cell-2"},
3: {name: "netflix-continue-watching-model-cell-3"}
};
return {
cellToModel: abTestCellToModel,
abTestId: abTestId,
targetObjectives: [objectives],
modelInputType: constants.TITLE_INPUT_TYPE,
modelType: "SCORER"
};
}
Rules are published via Gutenberg (Netflix's dataset pub/sub) and subscribed by both Switchboard and the serving cluster hosts — an independent release cycle for experiments, decoupled from platform code deploys. See patterns/config-separated-from-code-via-pubsub.
Control plane + data plane responsibilities¶
Control plane flow (Switchboard-era):
- Assignment — rules produce a model-to-cluster-shard assignment.
- Validation — all specified models are loaded into the serving cluster shard and their dependencies validated for successful execution.
- Mapping — the model-to-shard VIP mapping is provided to Switchboard.
Data plane flow (Switchboard-era, per request):
- Allocation — for an Objective like
ContinueWatchingRanking, Switchboard queries the Experimentation Platform for the userId's A/B cell allocation. - Model selection — use the allocation + A/B test rule to pick the model.
- Request routing — route to the serving cluster shard hosting the selected model, with context.
- Model execution (on the serving host) — run the workflow steps and return the response.
Why Netflix built it (explicit build-vs-buy)¶
From the post:
"Standard out-of-the-box API Gateway solutions (such as AWS API Gateway, a standalone Service Mesh proxy) did not meet all our requirements. In particular, we needed first-class integration with Netflix's experimentation platform, the ability to expose gRPC endpoints to clients, and the ability to use rich domain-specific context for routing customizations, which generic proxies were not designed to handle. Furthermore, the platform required customizations to model-specific lifecycle stages (shadow mode, canaries, rollbacks) to enable safe rollouts and migrations."
Canonical example on the wiki of the tradeoff captured in patterns/centralized-routing-proxy-for-ml-serving.
Why Netflix outgrew it (three named pains)¶
The post names three load-bearing problems that motivated the Lightbulb + Envoy split:
- Single point of failure. "Switchboard became a shared dependency whose failure would degrade or disable multiple ML-powered experiences at Netflix." With >30 client services depending on it, its blast radius was unacceptable.
- Added latency due to additional network hop. "Switchboard in the request path adds between 10–20ms of latency due to serialization-deserialization operations, depending on payload size. Additionally, it further exposes a request to tail latency amplification." — canonicalised as concepts/serialization-tax-in-proxy-path. Unacceptable for latency-sensitive clients.
- Reduced client flexibility. "Switchboard obscures visibility into client request origins from the serving clusters. Consequently, distinguishing data logged for real vs artificial traffic, which is essential for model training, is difficult and requires ongoing customization and increased MLOps overhead." — the concepts/tenant-isolation-in-routing-layer problem.
Succession: Lightbulb + Envoy¶
See systems/netflix-lightbulb. The short form:
- Switchboard's in-path proxy role → Envoy (already used
for all Netflix egress), which routes on
routingKeyheaders with minimal overhead. - Switchboard's Objective → model resolution role →
Lightbulb, a metadata-only resolver that produces
routingKey+ObjectiveConfig, off the payload's critical path. - Switchboard Rules survive as the config surface — the rule grammar carries over, now consumed by Lightbulb and the serving hosts (not by an in-path proxy).
Seen in¶
- sources/2026-05-01-netflix-state-of-routing-in-model-serving — canonical wiki disclosure of Switchboard's shape, capabilities, and the three pains that drove the Lightbulb architecture. 1M req/sec operating scale + >30 client services + 10–20ms serialization tax numbers disclosed here.
Related¶
- systems/netflix-lightbulb — metadata-only successor
- systems/netflix-model-serving-platform — parent platform
- systems/netflix-gutenberg — rule-publishing substrate
- systems/envoy — data-plane successor for in-path routing
- concepts/objective-abstraction
- concepts/serialization-tax-in-proxy-path
- concepts/tenant-isolation-in-routing-layer
- patterns/centralized-routing-proxy-for-ml-serving
- patterns/separate-routing-from-model-selection
- patterns/config-separated-from-code-via-pubsub