Skip to content

CONCEPT Cited by 1 source

Semantic enterprise context

Semantic enterprise context is the rich, structured information implicit in the relationships between an organisation's data assets — table schemas, dashboard definitions, notebook code, document text, lineage edges, ownership metadata — that a data agent can derive and exploit when answering business questions. Named in the 2026-05-08 Databricks post on Genie as the substrate that specialised knowledge search is built on.

The verbatim framing: "Genie uses the existing data assets such as workspace tables, notebooks, dashboards, documents, and files to derive a rich semantic enterprise context and then uses this context to construct a search index."

What semantic enterprise context contains

Asset type Direct content Relationship signals
Tables Schema (columns, types, comments) Lineage upstream/downstream; ownership; tier
Dashboards Visual definition + queries Which tables they read; who consumes them
Notebooks Code + prose + intermediate state Which datasets they touch; saved query patterns
Documents Text content Which metrics / tables / dashboards they reference
Files (workspace) Raw content (CSV, JSON, etc.) Author, modification history
Catalog metadata Names, descriptions, tags Cross-asset relationships materialised explicitly

The semantic part is the relationships between these — the edges of the implicit knowledge graph an organisation's data forms:

  • "Dashboard A is fed by Table X."
  • "Document D explains the business definition of Metric M, which is computed in column C of table T."
  • "Notebook N is the canonical reference implementation for KPI K."
  • "Tables T1 and T2 both have a column named revenue but T1 is pre-tax and T2 is post-tax."

These relationships are not visible in any single asset — they emerge when you see the asset graph as a whole.

Why text similarity isn't enough

A naive "vector-search across all workspace text" approach treats every asset as an isolated document. It misses:

  • Schema awareness"this column is a foreign key to that table" doesn't fall out of text similarity.
  • Lineage"this dashboard's number comes from this table" needs explicit graph traversal.
  • Authority signals"this is the production-tier owned by the finance team" is structured metadata, not text.
  • Cross-modal references — a doc mentioning a metric by name maps to a column with that name in the schema.

Specialised knowledge search exploits all of these; text similarity captures only the within-document signal.

Where semantic enterprise context comes from

In a Databricks-style lakehouse, much of it is materialised by:

  • Unity Catalog — schema, ownership, tier, lineage, descriptions.
  • Dashboard tooling — definitions of which queries each dashboard runs.
  • Notebook history — code + execution history.
  • Workspace organisation — folders, tags, descriptions.
  • Documents that reference metrics / tables (often via wiki integration).

In other ecosystems the substrate is similar: catalogue (DataHub, Amundsen, Apache Polaris) + dashboard metadata + workspace structure. The pattern is general; the specific catalogue matters less than the discipline of populating it.

The load-bearing dependency: governance discipline

Semantic enterprise context is only rich if the upstream data layer has been disciplined. Two failure modes if it hasn't:

  • Empty / sparse metadata — the catalogue exists but is empty; no descriptions, no ownership, no tier labels. Semantic context collapses to schema names alone.
  • Fragmented duplicates — 600 measure variants, 50 "revenue" columns across tables that disagree on their semantics; semantic context contains contradictions.

The Trinity Industries case study (Source: sources/2026-04-29-databricks-companies-winning-with-ai-built-the-data-layer-first) canonicalised this empirically: Genie's effectiveness on Trinity's queries depended on the prior measure-consolidation work that gave the semantic context something coherent to reflect.

Composes with other techniques

Technique How it uses semantic enterprise context
concepts/specialized-knowledge-search Builds search indices over the context
concepts/source-of-truth-disambiguation Uses context's authority signals to rank
concepts/agent-self-correction-loop Cross-checks intermediate results against context constraints
concepts/multi-llm-sub-agent-routing Search sub-agent grounded in context; planning sub-agent reasons over it

Semantic enterprise context is the substrate all four operate over. Without it, none of them have anything substantive to ground in.

  • vs concepts/knowledge-graph — knowledge graph is the general data structure; semantic enterprise context is the specific instantiation over an organisation's data assets in a lakehouse.
  • vs concepts/agent-infrastructure-memory (Grafana Assistant) — agent memory is curated knowledge built for the agent over time; semantic enterprise context is the raw substrate the agent derives knowledge from at query time.
  • vs concepts/master-data-management — MDM is the discipline of curating golden records; semantic enterprise context is the graph of how golden records (and not-so-golden records) relate.
  • vs concepts/context-engineering — context engineering is the general practice of preparing context for LLMs; semantic enterprise context is one specific kind of context, sourced from data assets.

Seen in

Last updated · 542 distilled / 1,571 read