Skip to content

PATTERN Cited by 1 source

Read-aside cache with dual invalidation

Read-aside cache with dual invalidation is a caching pattern for high-read-amplification storage workloads. The system maintains a read-aside cache (read-through-on-miss, write-around) and lets each namespace pick one of two invalidation strategies based on its consistency / staleness budget:

  1. Invalidate on write — strong eventual consistency at the cost of higher cache write throughput.
  2. TTL-driven invalidation — bounded staleness in exchange for lower cache pressure.

The choice is per-namespace, not system-wide.

Disclosure

Verbatim from Netflix Graph Abstraction Part I:

"To reduce read amplification on the durable store, the Graph Abstraction leverages KV's integration with EVCache. Multiple KV namespaces can share the same caching clusters for cost efficiency. The Abstraction first fetches data from durable storage, while subsequent reads are served from the cache. Caching is applied at both the record and item levels, benefiting all graph objects.

Graph Abstraction employs two invalidation strategies, selected based on write throughput and consistency requirements:

  • Invalidation on write: Both record and item caches are invalidated with every write, ensuring consistency across regions. This strategy is ideal for graphs that change infrequently and cannot tolerate data staleness, but comes with the tradeoff of pushing a higher throughput on the cache.
  • TTL-driven invalidation: Cache entries are invalidated only when their TTL expires. This approach works best for frequently modified objects that can tolerate some staleness."

Structure

                          Reader
                    ┌────────────────┐
                    │ EVCache lookup │  hit  →  return value
                    └────────┬───────┘
                             │ miss
              ┌───────────────────────────┐
              │ Durable store (KV layer)  │
              └────────────┬──────────────┘
                    Populate cache
                    Return to reader

Write path (per namespace strategy):

  invalidate-on-write:    durable.put(...) → cache.invalidate(key)
  TTL-driven:             durable.put(...) → no cache action; entry
                          ages out at TTL

Two strategies + when each fits

Strategy Best for Cost Consistency property
Invalidate on write Graphs that change infrequently AND cannot tolerate staleness Higher cache RPS (every write hits cache too) Strong eventual consistency across regions on cache
TTL-driven Graphs with high write throughput AND staleness tolerance Some stale reads up to TTL Bounded staleness ≤ TTL

Decision criteria from the post: "selected based on write throughput and consistency requirements." The exact threshold (which RPS, which staleness budget) is not disclosed.

Two cache levels — record + item

"Caching is applied at both the record and item levels, benefiting all graph objects."

In the KV Abstraction data model (HashMap<String, SortedMap<Bytes, Bytes>>):

  • Record-level — the entire sorted map for a partition (one graph node + all properties; one source node + all forward links).
  • Item-level — a single key-value pair within a record (one property of an edge; one specific link).

Different access patterns benefit from different cache levels: property-bag fetches benefit from record-level caching; targeted property reads benefit from item-level caching.

Cost-sharing across namespaces

"Multiple KV namespaces can share the same caching clusters for cost efficiency." — namespace isolation lives at the storage layer, not the cache layer. A single EVCache cluster fronts many graph namespaces. The trade-off: noisy-neighbour risk on the cache cluster; mitigation lies in EVCache's own multi-tenant sizing.

Composition

The read-aside pattern composes with:

  • patterns/write-aside-cache-for-edge-links — the write-aside analogue for edge link existence. The two caches sit at different points: write-aside for "does this link exist?" suppresses redundant writes; read-aside for property bags suppresses redundant reads.
  • Multi-region async replication of EVCache itself — the graph layer inherits eventual consistency on the cache from the substrate.
  • LWW + idempotency tokens — invalidation-on-write must be consistent with LWW so a stale reader doesn't see a value that's older than the cache claims.

Trade-offs

  • Per-namespace decision rule is implicit. The post does not disclose the threshold for choosing one strategy over the other. Operators need to know their write throughput + staleness tolerance to pick.
  • TTL-driven cache writes drift across regions. Each region ages its cache independently; a hot key may have different TTL clocks per region. For most workloads this is fine; for cross-region read consistency it requires care.
  • Invalidate-on-write requires cache write capacity. The cache must absorb every write, not just a fraction. For high-write graphs this can dominate the cache cluster's throughput budget.

Work in progress (per the post)

A write-through cache is being developed: "organize indexes by different sort orders (e.g., sorting data by last-write timestamp), at the cost of increased memory consumption." This is a third caching shape, not a third invalidation strategy on the read-aside cache.

Seen in

Last updated · 542 distilled / 1,571 read