Skip to content

CONCEPT Cited by 1 source

Tenant isolation in routing layer

Tenant isolation in routing layer is the requirement that multiple use cases (tenants) sharing a routing proxy be isolated from each other's failure and load-shape impact. When a single routing cluster carries many tenants with different traffic shapes, SLAs, and error behaviors, two cross-tenant failure modes emerge:

  1. Cross-tenant error cascade. A surge of problematic requests from one tenant can cascade errors back to the routing service, impacting other tenants sharing that same routing cluster.
  2. SLA heterogeneity. Tenants with diverse latency requirements (millisecond-sensitive fraud detection vs. patient batch-like recommendation) are forced through the same proxy, whose tuning is necessarily a compromise.

Canonicalised on the wiki by Netflix's 2026-05-01 Model-Serving post. Tenant isolation is named as one of three Switchboard pains — "reduced client flexibility" in the post's language — and is one of the three design principles motivating Lightbulb.

Netflix's framing

From the 2026-05-01 post (Lightbulb motivation):

"Consolidating multiple use cases (tenants) into a single routing cluster poses two main challenges. First, error propagation posed a risk, as a surge of problematic requests from one tenant could cascade errors back to Switchboard, potentially impacting other users. Second, the cluster had to accommodate diverse latency requirements because the requests from different use cases varied significantly in complexity."

This is the multi-tenant routing cluster anti-pattern stated precisely.

Dimensions of the problem

  • Load-shape diversity. One tenant's traffic spike (flash sale, system event, bug) must not consume routing-cluster capacity that belongs to another tenant.
  • Error-shape diversity. One tenant's bad requests (4xx flood, malformed payloads) must not trip circuit-breakers or thread pools that other tenants depend on.
  • Latency-requirement diversity. A routing service tuned for the slowest tenant over-commits resources; tuned for the fastest, it fails for the slow ones. See concepts/tail-latency-at-scale.
  • Observability-shape diversity. Shared routing cluster means logging / tracing / metrics are harder to attribute per tenant. Netflix specifically notes that Switchboard "obscures visibility into client request origins from the serving clusters. Consequently, distinguishing data logged for real vs artificial traffic, which is essential for model training, is difficult."

Mechanisms for isolation

  • Per-tenant routing cluster shards. Dedicate physical routing resources per tenant (or per tenant group). Scales the routing service N-fold but bounds blast radius.
  • Per-tenant resource quotas. Rate limits, concurrency limits, thread pool partitions at the routing service enforce fair sharing.
  • Move routing decisions to the data plane. Envoy as a sidecar or edge proxy provides natural cluster-level isolation because each tenant's traffic is routed by its own Envoy instance (if sidecars) or its own Envoy cluster-config (if edge). Netflix's Lightbulb + Envoy architecture uses this move.
  • Share the metadata resolver, isolate the data plane. Lightbulb can be a shared metadata-only service (small request, fast response) while the actual payload routing through Envoy carries tenant-specific cluster mappings.

Seen in

  • sources/2026-05-01-netflix-state-of-routing-in-model-serving — canonical wiki disclosure of the multi-tenant routing cluster anti-pattern at the ML-serving layer. Switchboard's shared routing cluster across 30+ client services + hundreds of model types motivated the shift to Lightbulb (metadata resolver shareable; data plane isolated via Envoy). Real-vs-artificial traffic separation for training-data logging called out as a specific MLOps pain induced by shared routing.
Last updated · 445 distilled / 1,275 read