Skip to content

SYSTEM Cited by 1 source

Netflix Titus Gateway

Titus is Netflix's container-management platform (AWS EC2 underneath). Titus Gateway is its API gateway — the tier that Netflix-internal clients talk to for reading job / container / node state.

The 2022 consistent-caching rebuild

Summarized from Netflix's engineering post "Consistent caching mechanism in Titus Gateway", surfaced in the High Scalability Dec-2022 roundup.

Problem: the previous Titus Gateway read-only API layer had reached the limit of vertical scaling. Adding more load meant bigger instances — and eventually the biggest available SKU wasn't big enough.

Solution: front the singleton leader-elected source-of-truth data with a horizontally-scalable consistent-caching layer.

                ┌──────────────────────┐
 clients ──►    │  Titus Gateway tier  │  (horizontally scalable)
 (internal)     │  with consistent     │
                │  cache of snapshot   │
                └──────────┬───────────┘
                           │ (replicated state + change stream)
                ┌──────────────────────┐
                │   Titus Master       │  (leader-elected,
                │  (source of truth)   │   singleton)
                └──────────────────────┘

Cache-coherence model: each Gateway instance maintains an in-memory copy of the full state snapshot (fits in memory, low latency). Updates propagate from the leader to all Gateway instances via a change stream; reads are served from the instance's local cache.

Outcome (as disclosed)

  • Unlimited horizontal scalability of the read-only API tier.
  • Better tail latencies vs. the vertically-scaled predecessor.
  • Minor sacrifice in median latency at low traffic (because updates have to propagate to all instances before reads see them).
  • No client-side changes required — the Gateway API contract is identical.

Why it generalizes

Netflix's post is explicit that the pattern applies to any system relying on a singleton leader-elected component as the source of truth for managed data, where the data fits in memory and latency is low. That's a broad class: orchestration systems, config services, service-discovery tiers, feature-flag evaluation APIs, etc.

See concepts/consistent-caching-horizontal-scale for the pattern distilled.

Seen in

Last updated · 319 distilled / 1,201 read