CONCEPT Cited by 1 source
Serialization tax in proxy path¶
Serialization tax in proxy path is the latency cost imposed on every request by an in-path proxy that must deserialize + re-serialize the payload to make a routing decision. The tax is proportional to payload size, is paid per-hop, and is amplified at the tail: a proxy's p99 deserialize latency becomes a contributor to the overall p99 latency of every downstream service.
Canonicalised on the wiki by Netflix's 2026-05-01 Model-Serving post, where Switchboard's "10–20ms of latency due to serialization-deserialization operations, depending on payload size" is one of three named pains that drove the architecture shift to Lightbulb + Envoy.
The mechanism¶
A routing proxy typically needs to make a decision based on some subset of the request. If the routing-relevant data is in the payload (not headers), the proxy must:
- Receive bytes from the client
- Deserialize the payload into the proxy's in-memory representation (gRPC protobuf decode, JSON parse, etc.)
- Inspect the relevant fields to decide where to route
- Re-serialize the payload to forward to the backend
- Send bytes to the backend
Steps (2) and (4) are the tax. Both scale with payload size. For small payloads the tax is small; for large payloads (images, feature vectors, long JSON blobs) it becomes the dominant in-proxy latency.
Netflix numbers¶
From the 2026-05-01 post:
- 10–20ms per request at Switchboard, depending on payload size.
- "[Switchboard] further exposes a request to tail latency amplification" — the proxy's tail latency compounds with the backend's tail latency.
- "[Payloads] could be quite large. Needing to deserialize and then re-serialize the payload as it flowed through Switchboard to make a routing decision was a significant contributor to latency and increased serving costs."
The architectural remedy¶
If the routing decision can be made from a small piece of metadata (e.g. a header, a cookie, a URL path), the serialization tax disappears. Two architectural moves enable this:
- Route on headers, not payload. HTTP headers can be read without touching the payload; modern proxies (Envoy, Nginx, HAProxy) have zero-copy forwarding paths for header-routed requests.
- Move routing-relevant decisions out of the proxy. A separate resolver service computes the routing key from request context + experiment state, then the client sets the header and hits the proxy directly with the payload.
Netflix's Lightbulb + Envoy architecture applies both moves:
Lightbulb produces a routingKey (header) and an ObjectiveConfig
(body), the client assembles both into the real request, and
Envoy routes on the header without parsing the body. See
systems/netflix-lightbulb and
patterns/separate-routing-from-model-selection.
Generalised to other patterns¶
The serialization tax manifests in several other places on the wiki:
- concepts/connection-multiplexing-over-http — if every HTTP/2 frame is re-framed at a proxy, the tax applies per-frame.
- concepts/ssl-handshake-as-per-request-tax — sibling tax at the TLS layer; different mechanism, same "per-request overhead limited by connection reuse" framing.
- concepts/zero-copy-data-sharing-protocol — the opposite end of the design space: formats designed to eliminate serde costs.
- patterns/zero-copy-sendfile-broker — Kafka broker's optimisation to avoid the equivalent tax at the storage layer.
When the tax is acceptable¶
Not every in-path proxy suffers it. Cases where the serialization tax is small or amortised:
- Small payloads. A proxy routing on 1 KB requests pays microseconds per hop.
- Sticky connection reuse. gRPC / HTTP/2 streams can amortise some per-connection serialization state.
- Routing decisions derivable from metadata only (URL path, headers, query string). If the proxy doesn't need to parse the body, the tax doesn't apply.
Netflix hit the tax because model-serving payloads can be large (feature vectors, lists of titles, full transaction objects) and the A/B-cell and context-aware routing decisions needed to inspect the body.
Seen in¶
- sources/2026-05-01-netflix-state-of-routing-in-model-serving — canonical quantified instance at 10–20ms per request at Switchboard. One of the three named pains that drove the Lightbulb + Envoy architecture. The "separate model inputs from request metadata" design principle is directly motivated by this tax.
Related¶
- concepts/tail-latency-at-scale
- concepts/connection-multiplexing-over-http
- concepts/ssl-handshake-as-per-request-tax
- concepts/zero-copy-data-sharing-protocol
- systems/netflix-switchboard
- systems/netflix-lightbulb
- systems/envoy
- patterns/separate-routing-from-model-selection
- patterns/zero-copy-sendfile-broker