CONCEPT Cited by 1 source
Consistent caching for horizontal-scale read APIs¶
Definition¶
A pattern for taking a singleton leader-elected source-of-truth service — where the data fits in memory and latency demand is low — and scaling its read-only API tier horizontally by fronting it with a consistent cache on many gateway instances. Documented canonically in Netflix's Titus Gateway rebuild, summarized in the High Scalability Dec-2022 roundup.
When it applies¶
Three preconditions (per Netflix):
- A singleton leader-elected component serves as the source of truth for some managed data.
- The data fits in memory on each gateway instance.
- Latency is low between the leader and the gateway tier (same DC / same region).
Common targets: orchestration control planes (Titus, Kubernetes API server read path), config services, service-discovery backends, feature-flag evaluation.
The mechanism¶
clients ──► gateway instance 1 ─┐
clients ──► gateway instance 2 ─┼──► source of truth (leader)
clients ──► gateway instance 3 ─┘
(each has full state cache + change stream)
- Each gateway instance holds an in-memory copy of the full state snapshot.
- Updates propagate from the leader to all gateway instances via a change stream (leader → cache coherence).
- Reads are served from the gateway instance's local cache — never reach the leader.
Trade-offs¶
- Unlimited read-scale — adding gateway instances is cheap.
- Better tail latency — no hot path through the leader for reads.
- Slight median-latency penalty at low traffic — updates have to propagate to all instances before reads see them; at low traffic, this propagation cost is visible.
- Cache-coherence lag is the new truth model — the API becomes eventually consistent within a bounded lag, not strong-read from the leader. Clients that need strong-reads can still hit the leader; most clients are OK with sub-second lag.
Why it's on this wiki¶
- Vertical-scaling limit on a singleton source-of-truth is a recurring problem: orchestrators, discovery services, flag evaluators, etc.
- Consistent-caching is cheaper than sharding the source of truth — it preserves the strong-consistency model inside the leader while moving the expensive fan-out to a read-replicated tier.
- The pattern generalizes beyond Titus Gateway — any singleton-source-of-truth system with the three preconditions can use it.
Seen in¶
- sources/2022-12-02-highscalability-stuff-the-internet-says-on-scalability-for-december-2nd-2022
- systems/titus-gateway