Skip to content

PATTERN Cited by 1 source

Query rewrite to pre-aggregated materialization

Pattern

When a consumer issues a query against a semantic-layer metric (e.g. MEASURE(revenue) over a date range), the query engine transparently rewrites the query to read from a pre-aggregated materialization when one is available that can satisfy the query — falling back to base-table computation when no materialization matches. The consumer is unaware of the rewriting; the metric query never changes whether or not a materialization is hit.

Forces

  • Aggregate-table maintenance burden when consumers explicitly query aggregate tables — every aggregation logic change forces consumer-side query updates.
  • Inconsistent performance when some queries hit aggregates and others scan base tables, with no platform-level coverage guarantees.
  • Consumer-side awareness of materialization details — which metric has which aggregate, at which grain, refreshed how recently — is operationally fragile.
  • Hand-routing logic in BI tools when each tool implements its own aggregate-detection rules.

Solution shape (verbatim from the source)

"Intelligent query rewriting: The engine routes queries to the best available materialization."

"Transparent routing: Users query metrics the same way — the system serves the fastest path."

"The dashboard and Genie examples above both queried the same Metric View, and both had their queries transparently routed to a materialization."

The pattern's defining property: the consumer's query and the materialization's existence are decoupled. Adding, removing, or re-grain-ing a materialization changes the query plan but not the query.

How the rewriter works (conceptual)

When a consumer issues MEASURE(metric_name) ... WHERE filter:

  1. Inspect available materializations for metric_name in the semantic-layer catalog.
  2. Match by coverage: identify the materialization whose grain (time-bucket size, dimension breakdown, applied filters) matches the consumer's query.
  3. Choose the best match: usually the finest-grain materialization that still covers the query — too coarse produces wrong answers; too fine wastes materialization-build work.
  4. Rewrite the SQL to read from the materialization instead of the base tables.
  5. Fall back to base-table computation if no materialization matches.

The architectural value: the materialization decisions ("materialize daily revenue", "materialize hourly revenue by region") are made by the platform engineer based on observed traffic, and the BI tool / agent / notebook never has to know.

Concrete deployment

The pattern is largely a substrate property — the engine (Databricks SQL warehouse + Unity Catalog metric resolution) does the rewriting. The platform engineer's role:

  1. Define the metric as a Metric View (patterns/governed-metric-as-headless-bi-substrate).
  2. Enable materialization at one or more grains (patterns/auto-materialized-aggregation-via-semantic-layer).
  3. Verify rewriting in query plans — the source's evidence uses Genie's query plan view to confirm transparent routing to a materialization. This becomes the validation surface for the rewriter's coverage envelope.
  4. Add or remove materializations based on observed hit rates from system.query.history or equivalent telemetry.

The consumer side stays unchanged through this entire process.

Trade-offs

Choice Implication
Aggressive rewriter (matches loosely) Higher hit rate; risk of approximation errors when grain doesn't quite line up.
Conservative rewriter (matches strictly) Lower hit rate; some queries that "should" hit silently fall back to base-table compute.
Multiple grains per metric Higher storage cost; better hit rate across query patterns.
Single grain per metric Lower storage cost; queries at other grains fall back to base-table.

Failure modes

  • Silent fallback. When the rewriter can't match a query to any materialization, the query falls back to base-table compute — slow but correct. The user perceives this as inconsistent performance without an obvious cause. Mitigation: expose materialization-hit-rate telemetry per metric.
  • Approximation when grains don't align. If the rewriter matches a query whose grain doesn't quite line up with the materialization, the result can be approximate. Most rewriters reject this rather than approximate, but the failure mode is worth checking — the source doesn't document the exact contract.
  • Stale materialization served as fresh. The rewriter must consider the freshness contract — if the materialization is stale beyond what the consumer accepts, it should fall back to base-table compute. Stale-result-served-as-fresh is the worst failure mode because it's a correctness issue, not a performance issue.
  • Plan thrashing. A workload that flips between cache-hit and cache-miss patterns can thrash between materialization-served and base-table-served plans, producing high latency variance.

Where this shows up on the wiki

Sibling patterns

  • Optimiser-side rewriting elsewhere on the wiki: Spark AQE re-plans during execution using runtime statistics; same principle (re-route based on what the engine learns at runtime), at a different layer (whole-query plan vs aggregate routing).
  • patterns/aggregation-pushdown-under-join — pushing aggregation under joins is a related rewriter optimisation that operates on base-table plans.
  • patterns/query-consolidation — consolidating in-flight identical queries; same architectural shape (work happens somewhere the consumer doesn't see).

Open questions / coverage envelope

  • What does the rewriter match on? The source says "the best available materialization" but doesn't document the matching algorithm — exact grain match? superset coverage with on-the-fly aggregation? approximate-grain detection?
  • Freshness contract. Under what staleness threshold does the rewriter prefer base-table compute over a materialization?
  • Cost-based vs heuristic. Does the rewriter choose between materializations using cost-based optimisation or fixed heuristics?

These are reserved for future ingests.

Seen in

  • sources/2026-05-27-databricks-bi-serving-pointers-maximizing-for-performance-and-tcofirst wiki canonicalisation as a pattern. Names the "intelligent query rewriting" and "transparent routing" properties verbatim and supplies the cross-consumer evidence (the dashboard and Genie examples both routed to the same materialization). Reserved for future ingests: rewriter matching algorithm, cost-based vs heuristic routing, freshness- threshold contract, multi-materialization selection logic.
Last updated · 542 distilled / 1,571 read