PATTERN Cited by 1 source

Partial materialized views (push-on-write, not pull-on-read)¶

The idea¶

Instead of planning how to fetch the data on every read (the default relational-DB model), plan how to process the data on every write, so that reading becomes essentially free.

Classic materialized views precompute the whole query; partial materialized views maintain only the rows being actively read, which keeps the write-side cost bounded even for high-cardinality sources.

Canonical system¶

PlanetScale Boost (MySQL / Vitess), from the 2022 blog post summarized in the High Scalability Dec-2022 roundup:

"Instead of planning how to fetch the data every time you read from your database, we want to plan how to process the data every time you write to your database. We want to process your data, so that reading from it afterward becomes essentially free. We want a very complex plan to apply to your writes, so there's no planning to be done for your reads; a plan where we push data on writes instead of pulling data on reads."

PlanetScale Boost claims up to 1000× query-performance improvement by inverting push vs. pull.

The design is an instance of Jonhoo's (Jon Gjengset's) academic work on dynamic, partially-stateful data-flow for incremental materialized-view maintenance (see Noria), now appearing in a commercial product.

The mechanism (generic)¶

writes ──► event bus ──► view-maintenance plan
                              │
                              ▼
                     ┌─────────────────────┐
                     │ partial materialized│
                     │ view (only the      │
                     │  rows being read)   │
                     └──────────┬──────────┘
                                │
  reads  ─────────────────────────► hit on the view (O(log n))
                                │
                                │ miss?
                                ▼
                         fall back to base tables,
                         promote the row into the view

Writes drive the view-maintenance data-flow graph.
Reads hit the view directly.
On miss, the engine promotes the missing rows into the view — "smarter" than a full materialization because it doesn't need to precompute everything.

When partial beats full materialization¶

High cardinality — a query with millions of possible result rows, of which only thousands are actually read. Full materialization wastes write-time on the unread rows; partial doesn't.
Skewed access — 80/20 reads. Partial maintains the 20% and falls back for the rest.
Mutable schemas / evolving queries — adding a new query just creates a new partial view; full materialization would require an expensive backfill.

Trade-offs¶

Write amplification: every write triggers view maintenance for every matching partial view. Bounded, but real.
Cache-miss latency spike: the first read for a cold row pays the fall-back-to-base-table cost plus the promotion cost.
Operational complexity: the data-flow graph is a new thing to reason about, debug, and monitor.

Contrast with upstream patterns¶

Clustering — cheaper to set up, doesn't restructure the query model, but single-digit-× improvement not 1000×.
Read replicas — scales read capacity but doesn't change the per-query cost model.
In-application caches — pushes cache invalidation to the app layer; partial materialized views keep it in the database.