PATTERN Cited by 1 source
Query rewrite to pre-aggregated materialization¶
Pattern¶
When a consumer issues a query against a semantic-layer
metric (e.g. MEASURE(revenue) over a date range), the query
engine transparently rewrites the query to read from a
pre-aggregated materialization when one is available that can
satisfy the query — falling back to base-table computation when
no materialization matches. The consumer is unaware of the
rewriting; the metric query never changes whether or not a
materialization is hit.
Forces¶
- Aggregate-table maintenance burden when consumers explicitly query aggregate tables — every aggregation logic change forces consumer-side query updates.
- Inconsistent performance when some queries hit aggregates and others scan base tables, with no platform-level coverage guarantees.
- Consumer-side awareness of materialization details — which metric has which aggregate, at which grain, refreshed how recently — is operationally fragile.
- Hand-routing logic in BI tools when each tool implements its own aggregate-detection rules.
Solution shape (verbatim from the source)¶
"Intelligent query rewriting: The engine routes queries to the best available materialization."
"Transparent routing: Users query metrics the same way — the system serves the fastest path."
"The dashboard and Genie examples above both queried the same Metric View, and both had their queries transparently routed to a materialization."
The pattern's defining property: the consumer's query and the materialization's existence are decoupled. Adding, removing, or re-grain-ing a materialization changes the query plan but not the query.
How the rewriter works (conceptual)¶
When a consumer issues MEASURE(metric_name) ... WHERE filter:
- Inspect available materializations for
metric_namein the semantic-layer catalog. - Match by coverage: identify the materialization whose grain (time-bucket size, dimension breakdown, applied filters) matches the consumer's query.
- Choose the best match: usually the finest-grain materialization that still covers the query — too coarse produces wrong answers; too fine wastes materialization-build work.
- Rewrite the SQL to read from the materialization instead of the base tables.
- Fall back to base-table computation if no materialization matches.
The architectural value: the materialization decisions ("materialize daily revenue", "materialize hourly revenue by region") are made by the platform engineer based on observed traffic, and the BI tool / agent / notebook never has to know.
Concrete deployment¶
The pattern is largely a substrate property — the engine (Databricks SQL warehouse + Unity Catalog metric resolution) does the rewriting. The platform engineer's role:
- Define the metric as a Metric View (patterns/governed-metric-as-headless-bi-substrate).
- Enable materialization at one or more grains (patterns/auto-materialized-aggregation-via-semantic-layer).
- Verify rewriting in query plans — the source's evidence uses Genie's query plan view to confirm transparent routing to a materialization. This becomes the validation surface for the rewriter's coverage envelope.
- Add or remove materializations based on observed
hit rates from
system.query.historyor equivalent telemetry.
The consumer side stays unchanged through this entire process.
Trade-offs¶
| Choice | Implication |
|---|---|
| Aggressive rewriter (matches loosely) | Higher hit rate; risk of approximation errors when grain doesn't quite line up. |
| Conservative rewriter (matches strictly) | Lower hit rate; some queries that "should" hit silently fall back to base-table compute. |
| Multiple grains per metric | Higher storage cost; better hit rate across query patterns. |
| Single grain per metric | Lower storage cost; queries at other grains fall back to base-table. |
Failure modes¶
- Silent fallback. When the rewriter can't match a query to any materialization, the query falls back to base-table compute — slow but correct. The user perceives this as inconsistent performance without an obvious cause. Mitigation: expose materialization-hit-rate telemetry per metric.
- Approximation when grains don't align. If the rewriter matches a query whose grain doesn't quite line up with the materialization, the result can be approximate. Most rewriters reject this rather than approximate, but the failure mode is worth checking — the source doesn't document the exact contract.
- Stale materialization served as fresh. The rewriter must consider the freshness contract — if the materialization is stale beyond what the consumer accepts, it should fall back to base-table compute. Stale-result-served-as-fresh is the worst failure mode because it's a correctness issue, not a performance issue.
- Plan thrashing. A workload that flips between cache-hit and cache-miss patterns can thrash between materialization-served and base-table-served plans, producing high latency variance.
Where this shows up on the wiki¶
- systems/databricks-metric-views — canonical instance; the
rewriter sits between the consumer's
MEASURE()call and the materialization layer. - concepts/metric-view-materialization — the broader concept this pattern operationalises.
- patterns/auto-materialized-aggregation-via-semantic-layer — the platform-engineer-side counterpart pattern that creates the materializations the rewriter routes to.
Sibling patterns¶
- Optimiser-side rewriting elsewhere on the wiki: Spark AQE re-plans during execution using runtime statistics; same principle (re-route based on what the engine learns at runtime), at a different layer (whole-query plan vs aggregate routing).
- patterns/aggregation-pushdown-under-join — pushing aggregation under joins is a related rewriter optimisation that operates on base-table plans.
- patterns/query-consolidation — consolidating in-flight identical queries; same architectural shape (work happens somewhere the consumer doesn't see).
Open questions / coverage envelope¶
- What does the rewriter match on? The source says "the best available materialization" but doesn't document the matching algorithm — exact grain match? superset coverage with on-the-fly aggregation? approximate-grain detection?
- Freshness contract. Under what staleness threshold does the rewriter prefer base-table compute over a materialization?
- Cost-based vs heuristic. Does the rewriter choose between materializations using cost-based optimisation or fixed heuristics?
These are reserved for future ingests.
Seen in¶
- sources/2026-05-27-databricks-bi-serving-pointers-maximizing-for-performance-and-tco — first wiki canonicalisation as a pattern. Names the "intelligent query rewriting" and "transparent routing" properties verbatim and supplies the cross-consumer evidence (the dashboard and Genie examples both routed to the same materialization). Reserved for future ingests: rewriter matching algorithm, cost-based vs heuristic routing, freshness- threshold contract, multi-materialization selection logic.
Related¶
- systems/databricks-metric-views — canonical instance.
- concepts/metric-view-materialization — the broader concept.
- concepts/headless-bi-semantic-layer — the substrate the pattern operates within.
- patterns/auto-materialized-aggregation-via-semantic-layer — the producer-side counterpart pattern.
- patterns/governed-metric-as-headless-bi-substrate — the pre-condition pattern.