PATTERN Cited by 1 source

Stateless invalidator¶

Problem¶

An invalidation-based cache fed by a CDC stream needs to translate row mutations → invalidation messages for affected queries. The natural-but-wrong way to do it is to keep an active-query subscription table (query_id → subscribers) and scan it per mutation. That table grows linearly with active subscriptions, becomes a memory bottleneck, and forces the invalidator into a stateful-service operational model (replication, failover, rebalancing).

Shape¶

Instead of tracking live queries, make the invalidator schema-aware only:

The invalidator only knows the schema's query shapes, not the set of currently-subscribed queries.
On a row mutation (from CDC / WAL), it iterates the (finite) set of query shapes, substitutes the pre/post-image column values into each shape's parameters, and emits an invalidation for the resulting (shape_id, arg_values) key.
Invalidations are routed to the cache shard responsible for that hash — which evicts the entry if present or no-ops if absent. Upstream notification handled by cache→edge fan-out.

Key property: the invalidator carries no per-query state. Every decision is a pure function of (schema, row_mutation). Therefore it's stateless compute.

Consequences¶

Horizontal scaling is trivial. Invalidators are partitioned the same way as the upstream DB (one invalidator shard per DB shard); each only processes its own WAL; no cross-node coordination.
No failover complexity. Restart re-tails WAL from the last acknowledged position; no subscription state to reconstruct.
Operational model is "Lambda-shaped" even if deployed as a long-running service — durable input = WAL LSN; durable output = emitted invalidations; no state in the middle.
Cache topology is agnostic to DB topology. The invalidator is the single service bridging the two; edges + caches don't have to know how many DB shards there are.

Pre-conditions¶

This pattern works when:

Query shapes are enumerable — the schema defines a small, finite set of shapes (as in Figma's ~700). Ad-hoc SQL breaks this.
Mutation → affected-shapes computable from the schema alone. Most easy shapes (equality) are; hard shapes (ranges, inequalities) require a special sidecar like patterns/nonce-bulk-eviction to stay tractable.
Invalidations-on-non-subscribed-queries are cheap — caches can no-op unknown keys without cost. This holds for typical hash-sharded caches.
Schema evolves more slowly than the invalidation rate — Figma states schema updates at "a day-to-day basis" vs invalidations at sub-second. Allows pre-distributing shape info to services before queries that need it.

Figma's LiveGraph 100x as the canonical instance¶

(Source: sources/2026-04-21-figma-keeping-it-100x-with-real-time-data-at-scale)

Service: invalidator, Go, sharded identically to physical Postgres shards.
Input: logical replication stream per shard (WAL-based CDC).
Processing: for each row change, iterate the ~700 schema shapes; substitute column values; emit invalidation.
Output: invalidation messages to relevant cache shards (the ones whose hash(easy-expr) range covers this invalidation).
State: none beyond WAL LSN checkpoint.
Pairs with: patterns/nonce-bulk-eviction (hard-query handling), patterns/independent-scaling-tiers (edge / cache / invalidator scale independently).

The architectural win called out in the post:

"Stateless invalidators could be aware of both database topology and the cache sharding strategy to deliver invalidations only to relevant caches, removing the excessive fan-in and fan-out of all database updates."

Anti-patterns it replaces¶

Subscription registry in the invalidator — list active queries, scan on every mutation. Fan-in (every server processes every shard's updates) and fan-out (every update delivered everywhere) are both O(shards × servers).
Push-to-every-server — the pre-100x LiveGraph architecture. Mutations broadcast to every LiveGraph server for local cache mutation. Scales poorly on either fleet size or update rate.

When it doesn't apply¶

Free-form SQL query engines — query shapes aren't enumerable; schema inspection can't predict affected queries.
Workloads where mutation → affected queries requires query execution itself (complex aggregates, joins with dynamic join keys). Asana's Worldstore is the post's named counterexample.
Schema change rate ≈ invalidation rate — no pre-distribution window; effectively the shape set is unbounded.

Precedents / neighbors¶

GraphQL persisted queries — the protocol-level prerequisite: fix the query set, then invalidation from schema becomes mechanical.
Materialized-view incremental maintenance in databases (IVM) — classical version: given a base-table update and a registered view definition, compute affected rows. Same idea, system-internal.
Kafka Streams repartitioning-by-key — the DB-topology-aware routing cousin: route updates to the partition that owns the key, don't broadcast.
AWS Lambda with an SQS/Kinesis source — Lambda-as-stateless- event-processor is the serverless deployment shape of this pattern, and Lambda's PR/FAQ frames stateless-by-contract as an enabling constraint for exactly this kind of workload.

Seen in¶

sources/2026-04-21-figma-keeping-it-100x-with-real-time-data-at-scale — canonical production instance. The LiveGraph 100x rebuild explicitly names stateless invalidators as the architectural unlock that lets caches and DB topology be decoupled.

concepts/invalidation-based-cache
concepts/query-shape — the schema enumerability prerequisite.
concepts/stateless-compute — the deployment-model peer.
concepts/change-data-capture — the upstream signal.
patterns/nonce-bulk-eviction — the hard-shape companion trick.
patterns/independent-scaling-tiers — what becomes possible once the invalidator is stateless.
systems/livegraph — production realization.