Skip to content

CONCEPT Cited by 1 source

Query shape

Definition

A query shape is the un-parameterized form of a query, assigned a stable unique ID. A live query is then identified by (shape_id, argument_values). The schema defines a finite set of query shapes; actual running queries are instances of shapes with substituted arguments.

Example (from Figma's LiveGraph):

  • Shape file_comments:
    SELECT * FROM comments WHERE file_id = $1 AND deleted_at IS NULL
    
  • Live query: file_comments("live-graph-love") — shape file_comments + argument $1 = "live-graph-love".

Why it matters for invalidation-based caching

In an invalidation-based cache, on a row mutation you need to answer: which queries should I invalidate?

The naïve solution is a subscription registry: maintain a map of (shape + args) → set of subscribers and scan it on every change. This doesn't scale — the registry is memory-heavy, and fan-out discovery is expensive under churn.

Query-shape trick: if every query is an instance of a schema-defined shape, then on a row mutation you can:

  1. Walk the (small, fixed) set of schema shapes.
  2. For each shape, substitute the mutation's row values into the shape's parameters.
  3. Emit invalidations for those parameterized queries — regardless of whether anyone's subscribed.

The invalidator never needs to know which queries are active. Every (shape, arg) tuple derived from a mutation either has no subscribers (invalidation is a no-op at the cache) or has subscribers (invalidation evicts the cache entry). The invalidator stays stateless.

Numeric scale of this design at Figma (Source: sources/2026-04-21-figma-keeping-it-100x-with-real-time-data-at-scale): ~700 query shapes total in the schema.

Easy vs hard shapes

Not every query shape admits one-arg-per-mutation invalidation. Figma's post partitions schema shapes into:

  • Easy shapes — equality predicates (=, IN (const), table membership). Substitute mutation row values → finite affected parameterizations (typically exactly one). Canonical: SELECT * FROM comments WHERE file_id = $1.

  • Hard shapes — range or open-ended predicates (>, <, BETWEEN, LIKE 'foo%', date ranges). A mutation affects potentially infinite parameterizations (all queries with a bound before the new value). Canonical:

    SELECT * FROM comments WHERE file_id = $1 AND created_at > $2
    
    A new comment at time T invalidates every query with $2 < T — unbounded.

At Figma, ~11 of ~700 shapes are hard (≈1.6%). Small enough to handle specially, too fundamental to drop.

Normalization rule

Figma enforces a schema discipline:

All queries must normalize to (easy-expr) AND (hard-expr).

Queries without a hard part just ignore the second conjunct. This lets Figma:

  • Invalidate via easy expressions only. Hard expressions are never directly invalidated.
  • Shard caches by hash(easy-expr) — all hard queries with the same easy-expr colocate on one cache instance.
  • Evict all hard queries sharing an easy-expr in one op via nonce indirection.

When query-shape-based invalidation works

  • Schema evolves slowly relative to query rate — Figma's schema changes "on a day-to-day basis with code updates" while invalidations happen sub-second. Can precompute + distribute shape info to services before users' queries arrive.
  • Small, enumerable set of shapes — ~700, not 700M. A DSL or GraphQL-like front-end typically produces this naturally. Ad-hoc SQL analytics does not.
  • Mutation → affected-shape computability is tractable — equality predicates are the common case; a schema-inspection tool can validate this before you rely on it.

When it doesn't

  • Ad-hoc SQL — the query space is the entire SQL grammar; shapes aren't enumerable.
  • Compute-heavy queries where the predicate structure is arbitrary — can't reduce mutation → affected-shape statically (e.g. aggregates, window functions with complex partitions). Asana's Worldstore is the post's named counterexample — "designed quite differently".
  • Dynamically generated queries — query structure emerges at runtime per-request; pre-enumerating shapes misses coverage.

Precedents / neighbors

  • Prepared statement hashing — SQL prepared statements canonicalize SQL text into a statement ID + bound parameters. Same idea, scoped to the DB planner's cache.
  • GraphQL persisted queries — only pre-registered query documents are accepted. Gives you a finite enumerable shape set at the protocol level.
  • Request-signature cache keys in CDNs — normalize URL + query
  • headers to a canonical key so two semantically-equivalent requests share a cache entry.
  • Kafka Streams' per-topology stores — precompute dependencies between topics/operators to know where a change propagates.

Seen in

Last updated · 200 distilled / 1,178 read