Skip to content

SYSTEM Cited by 1 source

Databricks Metric Views

Metric Views are a Unity Catalog primitive that provides a headless-BI semantic layer: define your data model and KPIs once in UC, and every consumer — AI/BI Dashboards, Genie, SQL notebooks, and third-party BI tools — resolves the same definition. Consumers query metrics with the MEASURE() function. The semantic-metadata fields (display_name, comment, synonyms) double as AI grounding context: when a user asks Genie "what was our revenue last week?", those annotations are how Genie maps natural language to the right measure and dimensions, "no custom prompts, no separate glossary". (Source: sources/2026-05-27-databricks-bi-serving-pointers-maximizing-for-performance-and-tco)

What it is

A Metric View is a UC object — created in SQL (CREATE METRIC VIEW …) or in the Unity Catalog Explorer point-and-click UI — that captures three things in one place:

  1. The measure logic — the SQL aggregation that defines the metric (e.g. SUM(revenue), COUNT(DISTINCT user_id)).
  2. The dimensional join graph — the fact + dimension tables that participate, the join keys, and the time grain.
  3. The semantic metadatadisplay_name (human-readable name), comment (description), synonyms (alternate phrasings for AI agents). These fields are not optional decoration; they are the AI-grounding contract that lets Genie and other agents map natural-language questions to the right measure / dimension combination without per-tool prompt engineering.

Verbatim from the source: "Metric Views go beyond centralized metric definitions — the semantic metadata is what sets them apart. […] The richer your metadata, the more accurately AI serves the right answer."

The consumer-side query model

-- Define ONCE in Unity Catalog
CREATE METRIC VIEW catalog.schema.metv_dbsql_metrics
  -- ... fact + dimension definitions ...
  -- + display_name, comment, synonyms

-- Consume from ANY tool
SELECT MEASURE(daily_query_count)
FROM catalog.schema.metv_dbsql_metrics
WHERE date >= current_date - 7;

The same MEASURE() call works from:

  • A SQL notebook running on a Databricks SQL Warehouse.
  • An AI/BI Dashboard widget — the warehouse-metrics dashboard example in the source post directly queries metv_dbsql_metrics.
  • A Genie natural-language session — "what was our daily query count last week?" gets resolved to the same MEASURE() call via the Metric View's semantic metadata.
  • A third-party BI tool with a live / DirectQuery connection to the lakehouse.

The architectural value: "Define a metric once, and every consumer — human or AI — gets the same answer."

Composition with materialization

Metric Views can be materialized — when materialization is enabled, the platform automatically maintains pre-aggregated results behind the same metric definition the BI tools already query. Four under-the-hood properties (verbatim from the source):

  • Automatic pre-aggregation. Metric results are pre-computed and stored.
  • Incremental refresh. Metrics stay current without full recomputation.
  • Intelligent query rewriting. The engine routes queries to the best available materialization.
  • Transparent routing. Users query metrics the same way; the system serves the fastest path.

The article's evidence: "The dashboard and Genie examples above both queried the same Metric View, and both had their queries transparently routed to a materialization." See concepts/metric-view-materialization + patterns/query-rewrite-to-pre-aggregated-materialization.

Why a headless-BI semantic layer

The pre-Metric-Views antipattern, named in the source: "Most organizations have the same business metrics defined in multiple places — a revenue calculation in one BI tool, a slightly different one in another, a third variant in a SQL notebook someone wrote last quarter. Each definition drifts independently. Nobody's sure which one is right."

The headless shape collapses this into one governed primitive:

Pre-Metric-Views Metric Views
Per-BI-tool semantic layer (LookML, Tableau calculated fields, Power BI DAX) Single UC-resident definition
Drift across tool-specific definitions Single source of truth
Per-tool glossary for AI features UC-resident display_name / comment / synonyms consumed by every agent
Aggregate tables maintained per BI tool One materialization shared across consumers
Governance enforced at each BI tool UC governance enforced at metric-resolution time

Generalised at concepts/headless-bi-semantic-layer + patterns/governed-metric-as-headless-bi-substrate.

AI grounding via semantic metadata

The architecturally-distinctive property of Metric Views is that their semantic metadata is machine-readable AI grounding context, not just human documentation. The chain:

  1. User asks Genie: "what was our revenue last week?"
  2. Genie searches Unity Catalog for Metric Views with matching display_name, synonyms, or comment values.
  3. The match — say a monthly_revenue Metric View with synonyms: ["revenue", "sales", "top line"] — supplies the measure, dimensions, and grain.
  4. Genie generates the MEASURE() query against the matched view.
  5. Query rewriting (if materialization is enabled) routes to the pre-aggregated form.

This is the same architectural shape as Cloudflare's layered grounded context for a data agent — schema metadata + human annotations form a layer the agent reasons over instead of inventing SQL from scratch. Metric View synonyms / display_name play the same role as DataHub's glossary terms.

The source's framing: "any agent with access to Unity Catalog can discover and query governed metrics through the semantic layer instead of hard-coded SQL."

Open standard provenance

The source explicitly names the OSS path: "The core implementation is open-sourced in Apache Spark™ ([SPARK-54119]), with Unity Catalog OSS support coming — so you're building on an open standard with no vendor lock-in. That openness matters more as AI takes on more of the BI workload. Agents querying your data need a consistent, machine-readable definition of what each metric means, and an open standard lets any tool or agent — not just vendor-specific ones — reason over the same governed metrics."

The metric-resolution protocol is therefore on track to be a documentable open spec, not a vendor-private API — analogous to how catalog-managed commits are an open contract that external engines integrate with.

Reflexive use case (in the source)

The source's worked example is itself an instance of the pattern: a Metric View metv_dbsql_metrics defined over the system.billing.usage and system.query.history system tables, queried from both an AI/BI Dashboard ("Warehouse Metrics") and Genie ("daily query count"). The takeaway: Metric Views compose uniformly across operational and analytical workloads — there is no separate primitive for self-observability. This composes with the TCO pointer: "Build Metric Views and an AI/BI Dashboard on system tables to gain visibility into your BI usage."

Composition with the rest of the BI serving stack

Metric Views sit at the semantic layer of the four-layer BI serving stack:

Consumers       AI/BI Dashboards / Genie / notebooks / third-party BI
                                 ▼ MEASURE(metric_name)
Semantic        ┌──── Metric Views ────┐  ◄── this page
                │  (governed in UC)    │
                └──────────┬───────────┘
                           │ query rewriting
Materialization ┌──── Pre-aggregated ──┐
                │  results (incremental │
                │  refresh)             │
                └──────────┬───────────┘
Physical        ┌──── Gold star schema ┐
storage         │  on UC managed tables │
                │  + Liquid Clustering  │
                │  + Predictive Optim.  │
                └──────────────────────┘

The semantic layer compounds the layer below: a Metric View on a well-clustered Gold-layer star schema, with materialization enabled, gets every layer's win for free at the consumer call.

Operational considerations (disclosed)

  • Materialization is opt-in per Metric View. The "Get started" guidance in the source recommends enabling materialization "for your highest-traffic metrics", not by default — pre-aggregation has a refresh cost, so the right rollout is metric-by-metric based on observed query patterns.
  • No staleness-window disclosure. The source names "incremental refresh" but does not specify the refresh latency under high ingest, the freshness contract for materialized vs on-the-fly metrics, or the cutover semantics during refresh.
  • No coverage envelope for query rewriting. The source claims "intelligent query rewriting" routes to the best available materialization but does not document which queries hit materializations and which fall back to base-table scans.

Seen in

  • sources/2026-05-27-databricks-bi-serving-pointers-maximizing-for-performance-and-tcofirst wiki disclosure of Metric Views as a distinct named primitive (previously implicit in Genie's "metrics layer for deterministic answers" framing). Names the load-bearing properties: headless-BI shape across multiple consumers, semantic metadata as AI-grounding context, materialization with incremental refresh + transparent query rewriting, open-standard provenance via SPARK-54119. Reflexive worked example — metv_dbsql_metrics over system.billing.usage / system.query.history, queried from both AI/BI Dashboard and Genie. Reserved for future ingests: query-rewrite optimiser coverage envelope, refresh latency / staleness contract, multi-tenant isolation under shared materializations, materialization storage cost vs query speedup trade-off, the SPARK-54119 wire spec.
Last updated · 542 distilled / 1,571 read