CONCEPT Cited by 1 source

Consistent caching for horizontal-scale read APIs¶

Definition¶

A pattern for taking a singleton leader-elected source-of-truth service — where the data fits in memory and latency demand is low — and scaling its read-only API tier horizontally by fronting it with a consistent cache on many gateway instances. Documented canonically in Netflix's Titus Gateway rebuild, summarized in the High Scalability Dec-2022 roundup.

When it applies¶

Three preconditions (per Netflix):

A singleton leader-elected component serves as the source of truth for some managed data.
The data fits in memory on each gateway instance.
Latency is low between the leader and the gateway tier (same DC / same region).

Common targets: orchestration control planes (Titus, Kubernetes API server read path), config services, service-discovery backends, feature-flag evaluation.

The mechanism¶

clients ──► gateway instance 1 ─┐
clients ──► gateway instance 2 ─┼──► source of truth (leader)
clients ──► gateway instance 3 ─┘
           (each has full state cache + change stream)

Each gateway instance holds an in-memory copy of the full state snapshot.
Updates propagate from the leader to all gateway instances via a change stream (leader → cache coherence).
Reads are served from the gateway instance's local cache — never reach the leader.

Trade-offs¶

Unlimited read-scale — adding gateway instances is cheap.
Better tail latency — no hot path through the leader for reads.
Slight median-latency penalty at low traffic — updates have to propagate to all instances before reads see them; at low traffic, this propagation cost is visible.
Cache-coherence lag is the new truth model — the API becomes eventually consistent within a bounded lag, not strong-read from the leader. Clients that need strong-reads can still hit the leader; most clients are OK with sub-second lag.

Why it's on this wiki¶

Vertical-scaling limit on a singleton source-of-truth is a recurring problem: orchestrators, discovery services, flag evaluators, etc.
Consistent-caching is cheaper than sharding the source of truth — it preserves the strong-consistency model inside the leader while moving the expensive fan-out to a read-replicated tier.
The pattern generalizes beyond Titus Gateway — any singleton-source-of-truth system with the three preconditions can use it.