CONCEPT Cited by 1 source

Semantic enterprise context¶

Semantic enterprise context is the rich, structured information implicit in the relationships between an organisation's data assets — table schemas, dashboard definitions, notebook code, document text, lineage edges, ownership metadata — that a data agent can derive and exploit when answering business questions. Named in the 2026-05-08 Databricks post on Genie as the substrate that specialised knowledge search is built on.

The verbatim framing: "Genie uses the existing data assets such as workspace tables, notebooks, dashboards, documents, and files to derive a rich semantic enterprise context and then uses this context to construct a search index."

What semantic enterprise context contains¶

Asset type	Direct content	Relationship signals
Tables	Schema (columns, types, comments)	Lineage upstream/downstream; ownership; tier
Dashboards	Visual definition + queries	Which tables they read; who consumes them
Notebooks	Code + prose + intermediate state	Which datasets they touch; saved query patterns
Documents	Text content	Which metrics / tables / dashboards they reference
Files (workspace)	Raw content (CSV, JSON, etc.)	Author, modification history
Catalog metadata	Names, descriptions, tags	Cross-asset relationships materialised explicitly

The semantic part is the relationships between these — the edges of the implicit knowledge graph an organisation's data forms:

"Dashboard A is fed by Table X."
"Document D explains the business definition of Metric M, which is computed in column C of table T."
"Notebook N is the canonical reference implementation for KPI K."
"Tables T1 and T2 both have a column named revenue but T1 is pre-tax and T2 is post-tax."

These relationships are not visible in any single asset — they emerge when you see the asset graph as a whole.

Why text similarity isn't enough¶

A naive "vector-search across all workspace text" approach treats every asset as an isolated document. It misses:

Schema awareness — "this column is a foreign key to that table" doesn't fall out of text similarity.
Lineage — "this dashboard's number comes from this table" needs explicit graph traversal.
Authority signals — "this is the production-tier owned by the finance team" is structured metadata, not text.
Cross-modal references — a doc mentioning a metric by name maps to a column with that name in the schema.

Specialised knowledge search exploits all of these; text similarity captures only the within-document signal.

Where semantic enterprise context comes from¶

In a Databricks-style lakehouse, much of it is materialised by:

Unity Catalog — schema, ownership, tier, lineage, descriptions.
Dashboard tooling — definitions of which queries each dashboard runs.
Notebook history — code + execution history.
Workspace organisation — folders, tags, descriptions.
Documents that reference metrics / tables (often via wiki integration).

In other ecosystems the substrate is similar: catalogue (DataHub, Amundsen, Apache Polaris) + dashboard metadata + workspace structure. The pattern is general; the specific catalogue matters less than the discipline of populating it.

The load-bearing dependency: governance discipline¶

Semantic enterprise context is only rich if the upstream data layer has been disciplined. Two failure modes if it hasn't:

Empty / sparse metadata — the catalogue exists but is empty; no descriptions, no ownership, no tier labels. Semantic context collapses to schema names alone.
Fragmented duplicates — 600 measure variants, 50 "revenue" columns across tables that disagree on their semantics; semantic context contains contradictions.

The Trinity Industries case study (Source: sources/2026-04-29-databricks-companies-winning-with-ai-built-the-data-layer-first) canonicalised this empirically: Genie's effectiveness on Trinity's queries depended on the prior measure-consolidation work that gave the semantic context something coherent to reflect.

Composes with other techniques¶

Technique	How it uses semantic enterprise context
concepts/specialized-knowledge-search	Builds search indices over the context
concepts/source-of-truth-disambiguation	Uses context's authority signals to rank
concepts/agent-self-correction-loop	Cross-checks intermediate results against context constraints
concepts/multi-llm-sub-agent-routing	Search sub-agent grounded in context; planning sub-agent reasons over it

Semantic enterprise context is the substrate all four operate over. Without it, none of them have anything substantive to ground in.

vs concepts/knowledge-graph — knowledge graph is the general data structure; semantic enterprise context is the specific instantiation over an organisation's data assets in a lakehouse.
vs concepts/agent-infrastructure-memory (Grafana Assistant) — agent memory is curated knowledge built for the agent over time; semantic enterprise context is the raw substrate the agent derives knowledge from at query time.
vs concepts/master-data-management — MDM is the discipline of curating golden records; semantic enterprise context is the graph of how golden records (and not-so-golden records) relate.
vs concepts/context-engineering — context engineering is the general practice of preparing context for LLMs; semantic enterprise context is one specific kind of context, sourced from data assets.

Seen in¶

sources/2026-05-08-databricks-pushing-the-frontier-for-data-agents-with-genie — canonical first wiki naming of semantic enterprise context as the substrate Genie's specialised knowledge search exploits. Verbatim: "workspace tables, notebooks, dashboards, documents, and files... derive a rich semantic enterprise context." The "rich" qualifier load-bears: not just schema metadata, but relationships between assets.
sources/2026-04-29-databricks-companies-winning-with-ai-built-the-data-layer-first — empirical canonicalisation that rich semantic enterprise context requires upstream governance discipline. Trinity's measure- consolidation work was the precondition that made the context Genie reflects coherent.