CONCEPT Cited by 1 source
Semantic enterprise context¶
Semantic enterprise context is the rich, structured information implicit in the relationships between an organisation's data assets — table schemas, dashboard definitions, notebook code, document text, lineage edges, ownership metadata — that a data agent can derive and exploit when answering business questions. Named in the 2026-05-08 Databricks post on Genie as the substrate that specialised knowledge search is built on.
The verbatim framing: "Genie uses the existing data assets such as workspace tables, notebooks, dashboards, documents, and files to derive a rich semantic enterprise context and then uses this context to construct a search index."
What semantic enterprise context contains¶
| Asset type | Direct content | Relationship signals |
|---|---|---|
| Tables | Schema (columns, types, comments) | Lineage upstream/downstream; ownership; tier |
| Dashboards | Visual definition + queries | Which tables they read; who consumes them |
| Notebooks | Code + prose + intermediate state | Which datasets they touch; saved query patterns |
| Documents | Text content | Which metrics / tables / dashboards they reference |
| Files (workspace) | Raw content (CSV, JSON, etc.) | Author, modification history |
| Catalog metadata | Names, descriptions, tags | Cross-asset relationships materialised explicitly |
The semantic part is the relationships between these — the edges of the implicit knowledge graph an organisation's data forms:
- "Dashboard A is fed by Table X."
- "Document D explains the business definition of Metric M, which is computed in column C of table T."
- "Notebook N is the canonical reference implementation for KPI K."
- "Tables T1 and T2 both have a column named
revenuebut T1 is pre-tax and T2 is post-tax."
These relationships are not visible in any single asset — they emerge when you see the asset graph as a whole.
Why text similarity isn't enough¶
A naive "vector-search across all workspace text" approach treats every asset as an isolated document. It misses:
- Schema awareness — "this column is a foreign key to that table" doesn't fall out of text similarity.
- Lineage — "this dashboard's number comes from this table" needs explicit graph traversal.
- Authority signals — "this is the production-tier owned by the finance team" is structured metadata, not text.
- Cross-modal references — a doc mentioning a metric by name maps to a column with that name in the schema.
Specialised knowledge search exploits all of these; text similarity captures only the within-document signal.
Where semantic enterprise context comes from¶
In a Databricks-style lakehouse, much of it is materialised by:
- Unity Catalog — schema, ownership, tier, lineage, descriptions.
- Dashboard tooling — definitions of which queries each dashboard runs.
- Notebook history — code + execution history.
- Workspace organisation — folders, tags, descriptions.
- Documents that reference metrics / tables (often via wiki integration).
In other ecosystems the substrate is similar: catalogue (DataHub, Amundsen, Apache Polaris) + dashboard metadata + workspace structure. The pattern is general; the specific catalogue matters less than the discipline of populating it.
The load-bearing dependency: governance discipline¶
Semantic enterprise context is only rich if the upstream data layer has been disciplined. Two failure modes if it hasn't:
- Empty / sparse metadata — the catalogue exists but is empty; no descriptions, no ownership, no tier labels. Semantic context collapses to schema names alone.
- Fragmented duplicates — 600 measure variants, 50 "revenue" columns across tables that disagree on their semantics; semantic context contains contradictions.
The Trinity Industries case study (Source: sources/2026-04-29-databricks-companies-winning-with-ai-built-the-data-layer-first) canonicalised this empirically: Genie's effectiveness on Trinity's queries depended on the prior measure-consolidation work that gave the semantic context something coherent to reflect.
Composes with other techniques¶
| Technique | How it uses semantic enterprise context |
|---|---|
| concepts/specialized-knowledge-search | Builds search indices over the context |
| concepts/source-of-truth-disambiguation | Uses context's authority signals to rank |
| concepts/agent-self-correction-loop | Cross-checks intermediate results against context constraints |
| concepts/multi-llm-sub-agent-routing | Search sub-agent grounded in context; planning sub-agent reasons over it |
Semantic enterprise context is the substrate all four operate over. Without it, none of them have anything substantive to ground in.
Distinguishing from related concepts¶
- vs concepts/knowledge-graph — knowledge graph is the general data structure; semantic enterprise context is the specific instantiation over an organisation's data assets in a lakehouse.
- vs concepts/agent-infrastructure-memory (Grafana Assistant) — agent memory is curated knowledge built for the agent over time; semantic enterprise context is the raw substrate the agent derives knowledge from at query time.
- vs concepts/master-data-management — MDM is the discipline of curating golden records; semantic enterprise context is the graph of how golden records (and not-so-golden records) relate.
- vs concepts/context-engineering — context engineering is the general practice of preparing context for LLMs; semantic enterprise context is one specific kind of context, sourced from data assets.
Seen in¶
-
sources/2026-05-08-databricks-pushing-the-frontier-for-data-agents-with-genie — canonical first wiki naming of semantic enterprise context as the substrate Genie's specialised knowledge search exploits. Verbatim: "workspace tables, notebooks, dashboards, documents, and files... derive a rich semantic enterprise context." The "rich" qualifier load-bears: not just schema metadata, but relationships between assets.
-
sources/2026-04-29-databricks-companies-winning-with-ai-built-the-data-layer-first — empirical canonicalisation that rich semantic enterprise context requires upstream governance discipline. Trinity's measure- consolidation work was the precondition that made the context Genie reflects coherent.