Skip to content

CONCEPT Cited by 1 source

Proxy tier latency tax

The proxy tier latency tax is the additional per-query network hop incurred when the application talks to a proxy (e.g. Vitess VTGate, PgBouncer, ProxySQL) that in turn talks to the sharded backend, instead of the application connecting directly to a single database. The canonical cost of the database-proxy-tier pattern — one extra round-trip per query.

The tax itself

Direct application → database: 1 network hop (round-trip). App → proxy → shard: 2 network hops in the critical path.

Ben Dicken's framing (Source: sources/2026-04-21-planetscale-database-sharding):

"Adding a proxy layer does come with a downside: added latency. By introducing the proxy, there is an additional network hop for requests coming in to our database … it takes longer!"

Co-location mitigation — ~1 ms in-DC, larger cross-region

The canonical mitigation is physical co-location — proxy and shards in the same data center:

"However, this problem can be minimized with proper consideration for server location. If the proxy and shards all live in the same data center, the added latency can be brought down to 1ms or less. For the vast majority of applications, adding 1ms is worth the scalability achieved with the sharded architecture." (Source: sources/2026-04-21-planetscale-database-sharding)

  • In-DC proxy-to-shard: sub-millisecond (sub-100 µs in the best case with kernel-bypass networking).
  • Cross-AZ: 1–2 ms typical.
  • Cross-region: 20–100+ ms — usually unacceptable for OLTP and structurally requires a different topology (proxy replicas per region, sticky routing).

The latency tax is a function of where the proxy is placed relative to the backends, not a property of the proxy tier itself.

Production reference — Slack @ 2 ms average query latency

Dicken's production upper-bound reference:

"For example, Slack runs massive sharded database cluster with Vitess, and reports an average query latency of only 2ms." (Source: sources/2026-04-21-planetscale-database-sharding; citing Slack's scaling-datastores-with-Vitess)

At 2 ms end-to-end average, the proxy latency tax plus the actual query work plus the response round-trip all fit inside 2 ms — evidence that well-engineered co-located proxy-tier deployments have a nearly negligible tax at the mean. Tail latency (p99, p999) is a harder case and depends on cross-shard-query rates, queue depths, and reparenting events.

The proxy is also a scaling axis

The proxy tier isn't a singleton — it's horizontally scaled alongside the shards. Dicken's framing:

"The proxy server hit the capacity for simultaneous queries it could process, and had to queue up other inserts. This added latency would be unacceptable for a production database system. To get around this, we can add more proxy servers." (Source: sources/2026-04-21-planetscale-database-sharding)

Proxy saturation is a distinct failure mode from shard saturation; capacity planning covers both. A 1-proxy, 100-shard deployment bottlenecks at the proxy long before the shards get warm; the operator plans a proxy fleet whose aggregate query capacity matches the shard fleet's aggregate query capacity, with headroom for peaks.

Distinct from cross-shard latency cost

The proxy latency tax is one extra hop per query — constant, ~1 ms, paid once per query. The cross-shard query cost is N extra hops for N shards involved plus the gather work — variable, potentially tens of milliseconds, paid only on cross-shard queries.

Both are proxy-tier costs; they add independently. A single-shard query through a co-located proxy is ~1 ms; a scatter-gather to all 128 shards is ~1 ms + max(per-shard-query-latency) across 128 shards, which is slower than the slowest shard (tail-amplification).

Seen in

Last updated · 347 distilled / 1,201 read