CONCEPT Cited by 1 source

Non-deterministic MV maintenance¶

Definition¶

Non-deterministic MV maintenance is the IVM problem of keeping a materialized view correctly up to date when the view definition includes non-deterministic functions — functions whose output depends on when they are evaluated, not just on their inputs.

The two named industrial examples (from Databricks at SIGMOD 2026):

current_date() — evaluates to a different value on each call as the calendar advances. Using it inside an MV definition (e.g. WHERE order_date > current_date() - 30) means the set of rows the MV should contain changes over time even when no input data changes.
AI / LLM functions — ai_query, ai_classify, embedding models, etc. The same input string may return different outputs on different runs because the underlying model version, sampling parameters, or retrieval context have changed.

Why this is hard¶

Standard IVM relies on the deterministic-MV invariant:

Given an input delta ΔT, computing f(ΔT) to produce the output delta ΔV yields the same ΔV regardless of when the computation runs.

This invariant is the algebraic underpinning that lets the engine (a) compute ΔV over the input delta alone, and (b) merge it into the persisted MV without revisiting unchanged rows. Non-deterministic functions break this invariant in two distinct ways:

Way 1: existing rows in the MV become stale even without input changes¶

If the MV definition includes WHERE order_date > current_date() - 30 and a day passes with no new orders, the MV is still wrong — rows that were within 30 days yesterday may not be today. There is no input delta to drive the IVM, but the MV needs to evict rows.

Way 2: incremental result is path-dependent¶

If an AI function ai_classify(text) produced category A for row r1 on the first refresh, and the model is updated before the second refresh, then a re-evaluation of ai_classify(r1.text) on the second refresh would produce category B — but the cached contribution from r1 to the MV still says A. Recomputing the delta over ΔT alone does not surface this issue.

Industrial responses (taxonomy)¶

Most prior industrial IVM systems handle non-determinism by rejecting the MV (the engine refuses to incrementally maintain it) or falling back to full refresh on every change (no real incrementalism). Both responses are coverage failures: the IVM thesis collapses for a meaningful slice of production MV shapes.

Enzyme's claim is correctness under incremental maintenance for both current_date() and AI functions. The blog post does not disclose the mechanism; plausible techniques include:

Technique	What it does	Applies to
Snapshot-timestamp pinning	Bind `current_date()` (and similar wall-clock functions) to a snapshot timestamp captured per refresh; re-evaluating the MV at the same snapshot yields the same result.	`current_date()`, `current_timestamp()`, `now()`
Time-window-aware incremental rewrite	Detect that the MV depends on a moving time window and emit deletions / re-evaluations for rows that crossed the window boundary since the last refresh.	Time-windowed MVs that filter by `current_date()`
Per-row AI-function result caching	Persist `ai_function(input) → output` as a side table keyed on input hash + model version; an MV refresh consults the cache rather than re-invoking the model.	AI functions where the same input recurs across runs
Model-version invalidation	When the AI function's model version changes, invalidate cached results and recompute the affected MV regions.	AI functions
Opt-out taint analysis	Detect non-deterministic-function call sites; mark dependent rows / partitions as "must recompute"; recompute only the marked region rather than the full MV.	Mixed deterministic + non-deterministic MVs
Compose with LLM-as-judge caching	The cache layer treats AI-function output as data with explicit invalidation semantics.	AI functions in pipelines

Enzyme's actual mechanism is deferred to the arXiv paper.

What this unlocks¶

If current_date() and AI functions are first-class citizens in the IVM grammar, three previously hand-coded ETL patterns collapse into declarative MVs:

Rolling-window aggregations — "orders in the last 30 days by region" — declared as a single MV, kept fresh as the calendar advances and as orders arrive.
AI-classified data lakes — a derived table whose rows include an ai_classify(text) column, kept in sync as new rows arrive and the model evolves.
AI-extracted entity tables — ai_extract output materialised into a derived table; refresh policy follows the AI-function cache invalidation rules.

These patterns previously required hand-written delete-then-insert logic, model-version tracking columns, and bespoke backfill code — all collapsed into the MV definition by Enzyme.

Seen in¶

sources/2026-05-29-databricks-databricks-at-sigmod-2026 — first wiki disclosure of the non-deterministic-IVM claim; Enzyme is named as the engine that handles current_date() and AI functions correctly under incremental maintenance, distinguishing it from prior industrial IVM.

concepts/incremental-view-maintenance — the parent concept.
concepts/materialized-view — what is being maintained.
concepts/multi-language-materialized-view — the other named Enzyme novel claim.
systems/enzyme-ivm — the engine making the claim.
systems/lakeflow-spark-declarative-pipelines — the surface where the non-deterministic MV is authored.