PATTERN Cited by 1 source
Auto-materialized aggregation via semantic layer¶
Pattern¶
When BI dashboards on a metric become slow, enable materialization on the metric in the semantic layer rather than hand-building aggregate tables + refresh pipelines + BI-tool query updates. The platform automatically maintains pre-aggregated results behind the same metric definition the BI tools already query, with incremental refresh and transparent query rewriting to the pre-aggregated form. The consumer's query never changes.
Forces¶
- Aggregate-table sprawl when every BI tool grows its own pre-aggregation tables for the same metrics.
- Refresh-pipeline maintenance burden — each aggregate table has its own Airflow/dbt job + monitoring + failure modes.
- BI-tool query updates every time the aggregation logic changes — the BI tool has to be repointed at the new aggregate.
- Governance drift when the aggregate table has different access controls than the source.
- Cost itemisation pain — aggregation pipelines become their own line items on the compute bill.
The source's framing of the antipattern: "A dashboard query takes 30 seconds, so someone builds an aggregate table to speed it up. That table needs a refresh pipeline. The pipeline needs monitoring. Then a second BI tool needs the same data in a slightly different shape, so someone builds another aggregate table using a separate pipeline. Before long, you're managing a sprawl of aggregates, extracts, and tool-specific semantic layers — each with its own staleness window, its own governance gaps, and its own line item on the compute bill."
Solution shape (verbatim from the source)¶
"Metric View materialization offers a simpler alternative. When you enable materialization on a Metric View, the platform automatically maintains pre-aggregated results behind the same metric definitions your BI tools already query — no separate aggregate tables to build, no BI tool queries to refactor."
The pattern collapses three coupled artifacts into one:
| Pre-pattern (3 coupled artifacts) | Post-pattern (1 governed primitive) |
|---|---|
| Aggregate tables (in storage) | Pre-aggregated form lives behind the Metric View |
| Refresh pipelines (Airflow / dbt jobs) | Incremental refresh runs as a substrate property |
| BI-tool queries pointed at aggregate tables | BI-tool queries continue to call MEASURE(); rewrite handles routing |
Concrete steps¶
- Pair this pattern with patterns/governed-metric-as-headless-bi-substrate — the pre-condition is that the metric is already defined as a Metric View (or equivalent governed primitive). Without that, there's nothing to materialize behind.
- Identify the highest-traffic metrics — the source's "Get
started" guidance is "enable materialization for your
highest-traffic metrics", not blanket-enable. Use
system.query.history(Databricks) or equivalent to find the metric+grain combinations that get queried most. - Enable materialization on the Metric View. Concrete syntax varies by platform; the Databricks shape is a property on the Metric View definition.
- Verify transparent routing. Check the query plan for confirmation that the consumer query is hitting the materialization instead of falling back to base-table compute. The source's evidence is a Genie query plan screenshot showing exactly this.
- Monitor the refresh latency under high ingest. If the materialization is more stale than the consumer requires, tune the refresh cadence or unmaterialise the metric.
Trade-offs¶
| Choice | Implication |
|---|---|
| Selective materialization (only highest-traffic metrics) | Avoids storage / refresh-cost sprawl; requires telemetry to choose what to materialize. |
| Blanket materialization | Storage and refresh costs dominate; only justified for narrow workloads where everything is hot. |
| Incremental refresh | Refresh cost amortised against write activity; freshness contract under high churn must be validated. |
| Transparent query rewriting | Consumer doesn't need to know about materialization; rewrite optimiser must cover the queries it expects to. |
Failure modes¶
- Stale materialization at high ingest. If refresh latency exceeds the freshness window the consumer cares about, the materialization either lies (returns stale data) or falls back to base-table compute. Mitigation: validate the freshness contract under realistic write churn.
- Query-rewriter coverage gaps. A materialization that should match a query but doesn't (because the rewriter is conservative or the grain doesn't quite line up) silently falls back to base-table compute — producing inconsistent performance the consumer notices but can't explain.
- Materialization sprawl — the same antipattern as pre-aggregation tables, just at a different layer. Mitigation: rationalise materializations periodically based on actual hit rate.
- Cost-attribution opacity. When the consumer call is the same whether or not a materialization is hit, cost-attribution per metric becomes harder unless the platform exposes it.
Sibling patterns¶
- patterns/governed-metric-as-headless-bi-substrate — the pre-condition pattern. Without a governed metric, there's nothing to materialize behind.
- patterns/query-rewrite-to-pre-aggregated-materialization — the optimiser-side counterpart pattern that makes transparent routing work.
- patterns/cdf-incremental-replacing-full-rescan — the same incremental-refresh principle at a different layer (Bronze→Silver pipeline transitions in Octopus Energy MHHS).
- patterns/precomputed-sketch-column-in-delta-table — a related precompute-then-serve pattern (probabilistic data structures stored as Delta-table columns for fast approximate answers).
Where this shows up on the wiki¶
- systems/databricks-metric-views — canonical instance.
- concepts/metric-view-materialization — the generalised concept.
- concepts/headless-bi-semantic-layer — the substrate this pattern materializes behind.
Seen in¶
- sources/2026-05-27-databricks-bi-serving-pointers-maximizing-for-performance-and-tco — first wiki canonicalisation as a pattern. Names the three-artifacts-collapse-into-one structural move and the four sub-properties (auto pre-aggregation / incremental refresh / intelligent query rewriting / transparent routing). Reserved for future ingests: refresh-latency contract under high ingest, query-rewriter coverage envelope, materialization storage-vs-speedup tuning, the relationship between materialization granularity and rewriter coverage.
Related¶
- systems/databricks-metric-views — the canonical instance.
- systems/uc-managed-tables — the storage substrate materializations sit on.
- concepts/metric-view-materialization — the generalised concept.
- concepts/headless-bi-semantic-layer — the substrate the pattern operates within.
- patterns/governed-metric-as-headless-bi-substrate — the pre-condition pattern.
- patterns/query-rewrite-to-pre-aggregated-materialization — the optimiser-side companion.