Skip to content

CONCEPT Cited by 1 source

Non-deterministic MV maintenance

Definition

Non-deterministic MV maintenance is the IVM problem of keeping a materialized view correctly up to date when the view definition includes non-deterministic functions — functions whose output depends on when they are evaluated, not just on their inputs.

The two named industrial examples (from Databricks at SIGMOD 2026):

  1. current_date() — evaluates to a different value on each call as the calendar advances. Using it inside an MV definition (e.g. WHERE order_date > current_date() - 30) means the set of rows the MV should contain changes over time even when no input data changes.

  2. AI / LLM functionsai_query, ai_classify, embedding models, etc. The same input string may return different outputs on different runs because the underlying model version, sampling parameters, or retrieval context have changed.

Why this is hard

Standard IVM relies on the deterministic-MV invariant:

Given an input delta ΔT, computing f(ΔT) to produce the output delta ΔV yields the same ΔV regardless of when the computation runs.

This invariant is the algebraic underpinning that lets the engine (a) compute ΔV over the input delta alone, and (b) merge it into the persisted MV without revisiting unchanged rows. Non-deterministic functions break this invariant in two distinct ways:

Way 1: existing rows in the MV become stale even without input changes

If the MV definition includes WHERE order_date > current_date() - 30 and a day passes with no new orders, the MV is still wrong — rows that were within 30 days yesterday may not be today. There is no input delta to drive the IVM, but the MV needs to evict rows.

Way 2: incremental result is path-dependent

If an AI function ai_classify(text) produced category A for row r1 on the first refresh, and the model is updated before the second refresh, then a re-evaluation of ai_classify(r1.text) on the second refresh would produce category B — but the cached contribution from r1 to the MV still says A. Recomputing the delta over ΔT alone does not surface this issue.

Industrial responses (taxonomy)

Most prior industrial IVM systems handle non-determinism by rejecting the MV (the engine refuses to incrementally maintain it) or falling back to full refresh on every change (no real incrementalism). Both responses are coverage failures: the IVM thesis collapses for a meaningful slice of production MV shapes.

Enzyme's claim is correctness under incremental maintenance for both current_date() and AI functions. The blog post does not disclose the mechanism; plausible techniques include:

Technique What it does Applies to
Snapshot-timestamp pinning Bind current_date() (and similar wall-clock functions) to a snapshot timestamp captured per refresh; re-evaluating the MV at the same snapshot yields the same result. current_date(), current_timestamp(), now()
Time-window-aware incremental rewrite Detect that the MV depends on a moving time window and emit deletions / re-evaluations for rows that crossed the window boundary since the last refresh. Time-windowed MVs that filter by current_date()
Per-row AI-function result caching Persist ai_function(input) → output as a side table keyed on input hash + model version; an MV refresh consults the cache rather than re-invoking the model. AI functions where the same input recurs across runs
Model-version invalidation When the AI function's model version changes, invalidate cached results and recompute the affected MV regions. AI functions
Opt-out taint analysis Detect non-deterministic-function call sites; mark dependent rows / partitions as "must recompute"; recompute only the marked region rather than the full MV. Mixed deterministic + non-deterministic MVs
Compose with LLM-as-judge caching The cache layer treats AI-function output as data with explicit invalidation semantics. AI functions in pipelines

Enzyme's actual mechanism is deferred to the arXiv paper.

What this unlocks

If current_date() and AI functions are first-class citizens in the IVM grammar, three previously hand-coded ETL patterns collapse into declarative MVs:

  1. Rolling-window aggregations"orders in the last 30 days by region" — declared as a single MV, kept fresh as the calendar advances and as orders arrive.
  2. AI-classified data lakes — a derived table whose rows include an ai_classify(text) column, kept in sync as new rows arrive and the model evolves.
  3. AI-extracted entity tablesai_extract output materialised into a derived table; refresh policy follows the AI-function cache invalidation rules.

These patterns previously required hand-written delete-then-insert logic, model-version tracking columns, and bespoke backfill code — all collapsed into the MV definition by Enzyme.

Seen in

Last updated · 542 distilled / 1,571 read