CONCEPT Cited by 1 source
Non-deterministic MV maintenance¶
Definition¶
Non-deterministic MV maintenance is the IVM problem of keeping a materialized view correctly up to date when the view definition includes non-deterministic functions — functions whose output depends on when they are evaluated, not just on their inputs.
The two named industrial examples (from Databricks at SIGMOD 2026):
-
current_date()— evaluates to a different value on each call as the calendar advances. Using it inside an MV definition (e.g.WHERE order_date > current_date() - 30) means the set of rows the MV should contain changes over time even when no input data changes. -
AI / LLM functions —
ai_query,ai_classify, embedding models, etc. The same input string may return different outputs on different runs because the underlying model version, sampling parameters, or retrieval context have changed.
Why this is hard¶
Standard IVM relies on the deterministic-MV invariant:
Given an input delta
ΔT, computingf(ΔT)to produce the output deltaΔVyields the sameΔVregardless of when the computation runs.
This invariant is the algebraic underpinning that lets the engine
(a) compute ΔV over the input delta alone, and (b) merge it into
the persisted MV without revisiting unchanged rows. Non-deterministic
functions break this invariant in two distinct ways:
Way 1: existing rows in the MV become stale even without input changes¶
If the MV definition includes WHERE order_date > current_date() - 30
and a day passes with no new orders, the MV is still wrong — rows
that were within 30 days yesterday may not be today. There is no
input delta to drive the IVM, but the MV needs to evict rows.
Way 2: incremental result is path-dependent¶
If an AI function ai_classify(text) produced category A for row
r1 on the first refresh, and the model is updated before the
second refresh, then a re-evaluation of ai_classify(r1.text) on
the second refresh would produce category B — but the cached
contribution from r1 to the MV still says A. Recomputing the
delta over ΔT alone does not surface this issue.
Industrial responses (taxonomy)¶
Most prior industrial IVM systems handle non-determinism by rejecting the MV (the engine refuses to incrementally maintain it) or falling back to full refresh on every change (no real incrementalism). Both responses are coverage failures: the IVM thesis collapses for a meaningful slice of production MV shapes.
Enzyme's claim is correctness under
incremental maintenance for both current_date() and AI
functions. The blog post does not disclose the mechanism; plausible
techniques include:
| Technique | What it does | Applies to |
|---|---|---|
| Snapshot-timestamp pinning | Bind current_date() (and similar wall-clock functions) to a snapshot timestamp captured per refresh; re-evaluating the MV at the same snapshot yields the same result. |
current_date(), current_timestamp(), now() |
| Time-window-aware incremental rewrite | Detect that the MV depends on a moving time window and emit deletions / re-evaluations for rows that crossed the window boundary since the last refresh. | Time-windowed MVs that filter by current_date() |
| Per-row AI-function result caching | Persist ai_function(input) → output as a side table keyed on input hash + model version; an MV refresh consults the cache rather than re-invoking the model. |
AI functions where the same input recurs across runs |
| Model-version invalidation | When the AI function's model version changes, invalidate cached results and recompute the affected MV regions. | AI functions |
| Opt-out taint analysis | Detect non-deterministic-function call sites; mark dependent rows / partitions as "must recompute"; recompute only the marked region rather than the full MV. | Mixed deterministic + non-deterministic MVs |
| Compose with LLM-as-judge caching | The cache layer treats AI-function output as data with explicit invalidation semantics. | AI functions in pipelines |
Enzyme's actual mechanism is deferred to the arXiv paper.
What this unlocks¶
If current_date() and AI functions are first-class citizens in the
IVM grammar, three previously hand-coded ETL patterns collapse into
declarative MVs:
- Rolling-window aggregations — "orders in the last 30 days by region" — declared as a single MV, kept fresh as the calendar advances and as orders arrive.
- AI-classified data lakes — a derived table whose rows include
an
ai_classify(text)column, kept in sync as new rows arrive and the model evolves. - AI-extracted entity tables —
ai_extractoutput materialised into a derived table; refresh policy follows the AI-function cache invalidation rules.
These patterns previously required hand-written delete-then-insert logic, model-version tracking columns, and bespoke backfill code — all collapsed into the MV definition by Enzyme.
Seen in¶
- sources/2026-05-29-databricks-databricks-at-sigmod-2026 —
first wiki disclosure of the non-deterministic-IVM claim;
Enzyme is named as the engine that handles
current_date()and AI functions correctly under incremental maintenance, distinguishing it from prior industrial IVM.
Related¶
- concepts/incremental-view-maintenance — the parent concept.
- concepts/materialized-view — what is being maintained.
- concepts/multi-language-materialized-view — the other named Enzyme novel claim.
- systems/enzyme-ivm — the engine making the claim.
- systems/lakeflow-spark-declarative-pipelines — the surface where the non-deterministic MV is authored.