Databricks — Governing AI agents at scale with Unity Catalog¶
Summary¶
A 2026-05-20 Databricks Blog vision post extending Databricks' coding-agent-governance playbook (the 2026-04-17 Unity AI Gateway launch — see sources/2026-04-17-databricks-governing-coding-agent-sprawl-with-unity-ai-gateway) to the general-agent population an enterprise now runs across every department: dev, analytics, sales, support, marketing, finance. The post's framing inverts the usual product-launch shape — instead of "here is feature X" it leads with two failure modes the reader is invited to recognise. Either you have "thousands of agents" logging differently, authenticating differently, accessing data differently, with no single place to look when someone asks "which agents are accessing customer PII?" — or you locked everything down, deployed nothing, and are now "six months behind competitors who moved faster" while "developers and users are frustrated. Some have left for companies where they can actually use AI tools." Both extremes are framed as risk; the post's load-bearing claim is that traditional governance — built for static applications and predictable code — cannot govern non-deterministic agents at all: "You can't govern an agent by reviewing what it might do. You govern it by controlling what it can access and monitoring what it actually does." The architectural answer is two surfaces, one identity model: Unity Catalog extended from data-only governance to also cover "every asset an AI system touches: LLMs, MCP servers, skills, and agents", plus Unity AI Gateway as the enforcement fabric through which "every model call, every tool invocation, every agent interaction" flows — evaluated against UC policies before execution, logged after. The thesis is structured as four pillars (concepts/four-pillars-of-agent-governance): (1) Delegated access — three named layers stacking permissions (OBO token passing) + per-call policies (Service Policies as UC functions evaluated before MCP tool calls, returning allow/deny/consent and fail-closed on deny) + content guardrails (systems/unity-ai-gateway-guardrails inline scanning of inputs for PII/jailbreak and outputs for hallucinations/sensitive content, fail-closed on every request); (2) Data-centric AI governance — the principle that "an agent's behavior is almost entirely determined by the data it has access to" so AI governance and data governance must be one system, with full request/response payloads landing in Inference Tables alongside UC audit logs in the lakehouse where they're queryable with the same SQL, joinable with business outcomes, and feedable to Lakewatch (Databricks' agentic SIEM); (3) Cost intelligence — usage-tracking writes every request with token counts/latency/identity/destination across Databricks-hosted and external providers to a single table tagged by team/project/cost-centre, Budgets add per-user/group thresholds with alerts (hard enforcement "more to share on that soon"); (4) Open and interoperable — concepts/governance-travels-with-resources: "governance becomes a property of your platform rather than something you rebuild for each new framework or model" — same UC + AI Gateway whether the agent is built on LangGraph, CrewAI, OpenAI SDK, Anthropic SDK, AutoGen, or LlamaIndex; same gateway whether the model is Databricks-hosted, Azure OpenAI, AWS Bedrock, or Anthropic. The post's structural rhetorical move: governance can't live only in the agent layer because the agent layer is the most volatile (frameworks ship weekly); it must live in the data + services that agents access, where governance can travel with the resource. The post is explicitly vision/positioning, not architecture-internals — no latency numbers, no scale numbers, no MCP-traffic-inspection mechanics, no inference-table schema, no cost-attribution algorithm, no Service Policy DSL syntax. Tier-3 ingest because the post-coding-agent generalisation of UC + AI Gateway, the four-pillars framing, and the named architectural extensions (Service Policies, Inference Tables, Lakewatch, Guardrails, Budgets) are individually citable abstractions the wiki should canonicalise.
Key takeaways¶
-
The agent-governance framing inverts traditional governance: "control what it can access, monitor what it actually does." "Traditional governance assumed humans make decisions and applications execute them predictably. Agents don't work that way. They're autonomous, they make different choices each time, and they chain together tools in ways you can't predict by reading code. You can't govern an agent by reviewing what it might do. You govern it by controlling what it can access and monitoring what it actually does." This is the load-bearing thesis: non-determinism breaks the static-application governance model. The replacement is bilateral: pre-execution access control (what the agent could do) plus post-execution observability (what it did do). Reviewing the agent's code or prompt — the static-application analog — is rejected as structurally insufficient. Canonical wiki framing of the agent-governance inversion (Source).
-
Both extremes — ungoverned sprawl and locked-down stagnation — are risks. "Ungoverned agents create risk you can't measure. Locked-down environments create a different kind of risk: falling behind while talent walks out the door." The post explicitly names talent flight as a governance failure: "Some have left for companies where they can actually use AI tools." This generalises coding-agent sprawl from the 2026-04-17 post (which was about devs running multiple coding tools) to org-wide agent sprawl across every department: "Every developer has a coding agent that writes, reviews, and ships code alongside them. Your analytics team built forecasting agents. Sales operations deployed lead scoring. The Support organization automated ticket routing. Marketing launched personalization. Finance built reconciliation workflows." The architectural implication: governance infrastructure must enable speed, not only restrict it (Source).
-
Four-pillar framing — first canonicalisation on the wiki. "Pillar 1: Delegated access … Pillar 2: Data-centric AI governance … Pillar 3: Cost intelligence … Pillar 4: Open and interoperable." The pillars are non-overlapping and stacked: access control is the gate, data governance is the substrate that gives access control meaning, cost is the operational dimension you can't manage without metering, and openness is the property that lets the governance survive framework / model changes. Companion to the three-pillar framing from the 2026-04-17 Unity AI Gateway post (security+audit / cost / observability) — this post adds Pillar 1 (delegated access) and Pillar 4 (open + interoperable) as load-bearing additions while compressing the prior security+audit + observability into the unified Pillar 2. Canonical instance of concepts/four-pillars-of-agent-governance (Source).
-
Pillar 1 has three layers: permissions → policies → guardrails. "In practice, these three layers work together: permissions control who can call what. Service Policies control whether a specific tool call should proceed in the context of a given request. Guardrails control what content flows in and out." The structural insight: none of the three layers subsumes the others. Permissions are too coarse to reason about runtime context ("Knowing that an agent is allowed to call GitHub doesn't tell you whether it should delete a file or merge a pull request"); Service Policies are too late if the model has already produced a sensitive payload; Guardrails are too generic if the agent never had permission to be in this conversation. The three-layer composition is what makes runtime control tractable. Canonical wiki instance of patterns/three-layer-agent-control (Source).
-
OBO token passing — identity flows from user to specific table row. "Databricks takes a different approach: identity flows end to end, from the user who asks the question to the specific table row the agent retrieves. Agents inherit the invoking user's data permissions in real time via on-behalf-of token passing, not a shared service account. If you can't access a table in Unity Catalog, neither can the agent acting on your behalf. Every action is logged against both identities: the real user who triggered the request and the agent that acted on their behalf, capturing which tables were accessed, what operations ran, and when." This is the second wiki disclosure of OBO at Databricks (after the 2026-04-17 post canonicalised it for coding-agent → MCP-server flows). The generalisation: OBO is now the data-access auth model for all agents, not just coding agents — and the "specific table row" phrasing is the load-bearing detail that ties OBO to UC's row-filter / column-mask infrastructure. Dual-identity logging (real user + agent) is named explicitly as an audit-trail requirement. Extension of patterns/on-behalf-of-agent-authorization beyond Redpanda's ADP framing (Source).
-
Service Policies are UC functions evaluated before tool execution, returning allow / deny / consent. "Service Policies, which are UC functions, managed in UC and attached to registered MCPs in Unity Catalog that control which tool calls succeed. Every tool call is evaluated before execution: based on the tool name, its arguments, or the identity of the caller, the policy returns allow, deny or asks for user consent. If the policy evaluation results in a 'Deny', the call is blocked." The architectural shape: (a) policy is code — UC functions, with the same lifecycle (versioning, audit, ownership) as data-governance code; (b) policy is attached to the tool surface, not the agent — "attached to registered MCPs in Unity Catalog" — so a single policy applies to every agent that calls the MCP; (c) policy returns a ternary, not a boolean —
consentis a first-class outcome that requires user-in-the-loop to proceed; (d) fail-closed on deny. The deeper move: by making the policy "a UC function", Databricks reuses UC's existing enforcement substrate (the same engine that evaluates ABAC + row-filters + column-masks) for tool-call admission control. Canonical wiki instance of systems/uc-service-policies + patterns/policy-as-uc-function-attached-to-mcp (Source). -
Guardrails are inline content scanners that fail-closed on every request. "At the model layer, guardrails inspect what flows through inference in real time, scanning inputs for PII and jailbreak attempts, checking outputs for hallucinations and sensitive content before they reach the user. They run inline on every request and fail closed." The structural properties: (a) bidirectional — both inputs (PII, jailbreak attempts) and outputs (hallucinations, sensitive content) are scanned; (b) inline — on the inference critical path, not async post-processing; (c) per-request — every request, not sampled; (d) fail-closed — explicit choice (the security default for guardrails, not the availability default). This is the wiki's first canonical instance of an AI runtime content guardrail — distinct from the existing AI agent guardrails (CI/quality gates for AI-generated code) concept which is about static review of code that AI produces, not runtime scanning of model I/O. Canonical wiki instance of concepts/inline-llm-content-guardrail + systems/unity-ai-gateway-guardrails (Source).
-
Pillar 2's principle: AI governance is a data-governance problem in disguise. "Here's the principle most AI governance tools miss: an agent's behavior is almost entirely determined by the data it has access to. What it can read, how fresh that data is, whether sensitive fields are masked, these aren't AI governance questions. They're data governance questions. Treat them separately, and you end up with two incomplete systems. Treat them together, and governance becomes self-reinforcing." The architectural payoff: the data classification you already have becomes your AI governance automatically. UC's data classification tags drive ABAC row-filter / column-mask policies; agents querying UC inherit those filters by virtue of OBO token passing — without any AI-specific configuration. The structural failure mode the post warns against ("two incomplete systems") is the typical state of orgs that bolt AI-specific guardrails onto an existing data platform: the AI layer doesn't know which columns are PII, the data layer doesn't know which queries came from agents. Canonical wiki instance of concepts/data-centric-ai-governance (Source).
-
Inference Tables: full payload of every model call written to lakehouse-resident tables. "AI Gateway writes the full payload of every model call to inference tables: the exact prompt sent, the exact response returned, token counts and latency. Unity Catalog captures every access operation in audit logs, including which principal called what, from which agent and at what time. Both land in your lakehouse as tables, retainable on your terms. Most logging architectures force a trade-off between completeness and cost, requiring you to sample, filter, and set short retention windows. Because Unity AI Gateway captures observability data in your lakehouse, you don't have to." The load-bearing claim is economic: lakehouse-resident inference tables break the completeness-vs-cost tradeoff that APM-style logging architectures impose. The post explicitly names the regulatory forcing function: "Emerging AI regulations require organizations to demonstrate what their AI systems did, what they were given, and what they produced." Inference Tables operationalise that requirement — "the exact prompt sent, the exact response returned" is replayable evidence, not summarised metrics. Canonical wiki instance of systems/inference-tables + patterns/inference-payload-table-for-audit — extends the pre-existing patterns/telemetry-to-lakehouse beyond OTel-style metrics/traces to full request/response payloads (Source).
-
Lakewatch: agentic SIEM on the security lakehouse. "Lakewatch, Databricks' agentic SIEM built on the security lakehouse, takes this further still, turning the same audit trail into active security intelligence: AI-driven threat detection and response built on the lakehouse. Attackers are using agents. Defenders should too." First wiki disclosure of systems/lakewatch. The architectural shape: same audit-trail substrate (UC tables + inference tables) doubles as the input to an agent-driven detection-and-response platform — closing the loop where the same governance data that proves what agents did powers an agentic system that detects what attackers' agents are doing. The asymmetry framing ("Attackers are using agents. Defenders should too") is the post's most direct argument for treating agent-audit data as security infrastructure, not just compliance-paperwork (Source).
-
Data quality monitoring + classification feed access control automatically. "Data quality monitoring continuously tracks freshness and completeness across your catalog. Join it against agent traces, and you move from 'the agent gave a wrong answer' to 'the agent queried a table that had been flagged as stale', connecting agent behavior to the quality of the data underneath it. Data classification adds a further layer: an agentic AI system continuously scans and tags sensitive columns, such as PII, HIPAA and GDPR-regulated data, and those tags feed directly into access control. Masked columns remain masked regardless of which agent or framework requests them." The two-step composition: (a) agentic classification of columns produces tags (the same UC Data Classification system disclosed at GA on 2026-05-13); (b) tags feed ABAC access control so masked columns stay masked through OBO token passing. The structural insight: "regardless of which agent or framework requests them" — the masking lives at the data layer, not the agent layer. Forensic value: joining quality-monitoring outputs against agent traces enables root-cause attribution from agent answer back to data freshness, which the post frames as a debugging primitive (Source).
-
Cost intelligence requires a metering layer that sees all AI traffic + tagging that attributes it. "The root cause isn't a broken process. It's missing infrastructure: no metering layer that sees all AI traffic in one place, no tagging system that attributes it to teams or use cases, no spend controls sitting alongside the access controls governing the same resources." The architectural answer: usage-tracking logs every request to usage tables (token counts, latency, requester identity, model destination across Databricks-hosted and external providers in a single table) with tagging by team / project / cost centre, then Budgets add the policy layer (per-user / per-group spend thresholds with alerts; "hard enforcement is the natural next step, and we'll have more to share on that soon" — i.e. enforcement is roadmap, alerting is shipped). The companion forensic example: "An agent that costs $200 and generates $50K in qualified pipeline is a bargain. An agent that costs $200 querying stale data in a loop is a waste. Without joining cost to outcome, you can't tell the difference." — cost-vs-outcome is the join key that makes cost-tracking actionable rather than ledger-only (Source).
-
Pillar 4's slogan: governance becomes a property of your platform. "Governance can't live only in the agent layer. It also needs to live in the data and services that agents access, whether those services are Databricks-managed or not. An agent built on LangGraph and one built on CrewAI both query the same Unity Catalog, invoke the same governed MCP servers, and flow through the same AI Gateway. The framework is irrelevant. Governance travels with the resources, not the code that calls them." The structural insight: agent frameworks ship weekly; if governance is bound to the framework, every framework update is a governance migration. Binding governance to the resource (the table, the MCP server, the model endpoint) means the resource's policy applies regardless of which framework called it. The post enumerates the in-scope frameworks: LangGraph, CrewAI, LangChain, LlamaIndex, AutoGen, OpenAI SDK, Anthropic SDK; and the in-scope model providers: Databricks-hosted, Azure OpenAI, AWS Bedrock, Anthropic. Canonical wiki instance of concepts/governance-travels-with-resources (Source).
-
MLflow tracing auto-instruments every named framework. "MLflow tracing auto-instruments LangChain, LlamaIndex, AutoGen, the OpenAI SDK, the Anthropic SDK, and more, with traces landed in Unity Catalog as tables without custom instrumentation per framework." This is the third Databricks post (after 2025-12-03 and 2026-04-17) to name MLflow tracing as the framework-coupling-breaking observability primitive. The structural property — "without custom instrumentation per framework" — is what makes Pillar 4 (open + interoperable) operationally tractable: the cost of adding a new framework is borne once, in MLflow's auto-instrumentation logic, not per-team in glue code. The trace data landing in UC-managed Delta tables ties this back to Pillar 2's joinable-with-business-data property (Source).
Architectural elements + numbers (from source)¶
- Frameworks named (governance-agnostic-to): LangGraph, CrewAI, LangChain, LlamaIndex, AutoGen, OpenAI SDK, Anthropic SDK.
- Model providers named (gateway-unified): Databricks-hosted (via Foundation Model API), Azure OpenAI, AWS Bedrock, Anthropic.
- Tool surfaces named (governance-extends-to): LLMs, MCP servers, skills, agents.
- External services named for OBO/MCP-registration examples: GitHub, Jira, Slack.
- Regulation framings named: GDPR, HIPAA — both for Data Classification's built-in classifier coverage and as the regulatory forcing function for full-payload audit trails.
- AI-coverage matrix in Pillar 1:
| Layer | Decision granularity | What it controls | Failure mode if absent |
|---|---|---|---|
| Permissions (OBO) | Per-user-per-resource | Who can call what | Coarse — ignores runtime context |
| Service Policies | Per-tool-call | Whether this specific tool call should proceed in the context of this request | Late — tool already chosen |
| Guardrails | Per-request payload | What content flows in and out | Generic — knows nothing about identity |
- Pillar enumeration:
| Pillar | Substrate | Companion concept |
|---|---|---|
| 1. Delegated access | UC permissions + Service Policies + Guardrails | patterns/three-layer-agent-control |
| 2. Data-centric AI governance | UC audit logs + Inference Tables + Data Classification + Data Quality Monitoring + Lakewatch | concepts/data-centric-ai-governance |
| 3. Cost intelligence | Usage-Tracking tables + Budgets | per-user / per-group thresholds with alerts |
| 4. Open and interoperable | MCP + Unity AI Gateway + MLflow tracing | concepts/governance-travels-with-resources |
- Data classifications named (built-in classifier coverage referenced): PII, HIPAA, GDPR-regulated data (consistent with the 2026-05-13 UC Data Classification GA disclosure of GDPR / HIPAA / GLBA / DPDPA / PCI + UK/Germany/Australia/Brazil regional packs).
- Cost-vs-outcome example numbers: $200 cost / $50K pipeline (good); $200 cost on stale-data loop (waste). Illustrative, not measured.
- Roadmap items disclosed:
- Hard budget enforcement (currently alerting-only): "Hard enforcement is the natural next step, and we'll have more to share on that soon."
Caveats¶
- Vision/positioning post, not architecture-internals. No latency numbers, no scale numbers, no MCP-traffic-inspection mechanics, no inference-table schema, no cost-attribution algorithm, no Service Policy DSL syntax, no guardrail-classifier model details. The post is structurally a strategic framing of an existing product surface with named architectural extensions — it cites linked deeper-dive posts (Service Policies, Budgets, Unity AI Gateway, DASF) but doesn't restate their architecture in this article.
- Pillar 4's framework-agnosticism is asserted, not demonstrated. The post claims "governance travels with the resources" and lists supported frameworks, but doesn't show a worked example of (e.g.) the same Service Policy applying to a LangGraph agent and a CrewAI agent calling the same MCP server. Reader has to trust the claim.
- OBO mechanics are partially disclosed. The post asserts identity flows "from the user who asks the question to the specific table row the agent retrieves" but the token-passing mechanics are linked-out to docs (
agent-authentication-model-serving#on-behalf-of-user-authentication) rather than restated. The dual-identity logging (real user + agent) is named but the audit-record schema isn't shown. - Service Policies described, not specified. "UC functions, managed in UC and attached to registered MCPs" names the substrate but not the function signature. Whether policies see request body, response body, both; whether they can mutate (vs only allow/deny/consent); whether they compose; whether they have access to UC table data — none disclosed.
- Guardrails classifier accuracy/false-positive rate not disclosed. Inline scanning of every input for PII + jailbreak and every output for hallucinations + sensitive content is asserted but no precision/recall data, no false-positive rate, no latency-overhead numbers.
- Inference Tables retention and schema not specified. "Retainable on your terms" implies customer-controlled retention; "the exact prompt sent, the exact response returned, token counts and latency" names the columns at high level but not the table schema, partition strategy, or how multi-modal payloads are stored.
- Lakewatch is named for the first time on the wiki but barely described. "Databricks' agentic SIEM built on the security lakehouse" — agentic-SIEM as a category and lakehouse as substrate are clear; the agentic detection-loop architecture, the rule-vs-LLM detection mix, the response automation, etc. are not disclosed in this post.
- Budgets are alerting-only at write time. The post explicitly defers hard enforcement: "Hard enforcement is the natural next step, and we'll have more to share on that soon." Today's product is per-user/group threshold alerts, not blocking quotas — which limits the cost-control claim to "signal you need before spend becomes a problem", not back-pressure.
- No production scale or adoption numbers disclosed. No customer count, no inference-volume number, no per-day request count, no per-month cost, no number of MCP servers under management, no number of Service Policies deployed.
- Data-centric framing partially overlaps prior wiki canonicalisation. "AI governance is data governance" echoes the 2026-05-13 ABAC + Data Classification GA framing where UC was "not just where governance is recorded but where it is expressed, evaluated, and enforced" — this post extends that thesis to AI specifically, but doesn't add new mechanism beyond linking to the prior pieces.
- Tier-3 source. Databricks' company blog posts skew vision/marketing; this one is more vision-heavy than mechanism-heavy. Ingested because the four-pillar generalisation, the named architectural extensions (Service Policies, Inference Tables, Lakewatch, Guardrails, Budgets) — each with their own product page linked — and the post-coding-agent generalisation of UC + AI Gateway are individually citable abstractions worth canonicalising.
- Companion learning materials cited (not ingested):
Source¶
- Original: https://www.databricks.com/blog/governing-ai-agents-scale-unity-catalog
- Raw markdown:
raw/databricks/2026-05-20-governing-ai-agents-at-scale-with-unity-catalog-50b851ac.md
Related¶
- Predecessor coding-agent-only post: sources/2026-04-17-databricks-governing-coding-agent-sprawl-with-unity-ai-gateway — same product surface (UC + Unity AI Gateway), narrower scope (coding agents + MCP traffic from coding-tool clients).
- Sibling tool-registry framing: sources/2026-05-11-mongodb-fighting-tool-sprawl-the-case-for-ai-tool-registries — MongoDB's registry as the upstream coordination point argument is the structural sibling of governance-travels-with-resources; both reject decentralised per-team governance as the failure mode.
- Sibling agentic-data-plane framing: sources/2025-10-28-redpanda-introducing-the-agentic-data-plane — Gallego's governed autonomy / OBO / replayable-audit-envelope position, the canonical wiki instance of concepts/governed-agent-data-access. Databricks' four-pillar thesis here lands in the same conceptual neighbourhood from the catalog/lakehouse direction.
- Sibling proxy-choke-point instance: sources/2026-04-20-cloudflare-internal-ai-engineering-stack — same single-proxy + central-audit + BYOK shape, specialised for Cloudflare's internal-app workloads instead of cross-department agent population.
- Companion systems: systems/unity-catalog · systems/unity-ai-gateway · systems/uc-service-policies · systems/unity-ai-gateway-guardrails · systems/unity-ai-gateway-budgets · systems/inference-tables · systems/lakewatch · systems/model-context-protocol · systems/mlflow.
- Companion concepts: concepts/four-pillars-of-agent-governance · concepts/data-centric-ai-governance · concepts/inline-llm-content-guardrail · concepts/governance-travels-with-resources · concepts/governed-agent-data-access · concepts/centralized-ai-governance · concepts/coding-agent-sprawl · concepts/audit-trail · concepts/fail-open-vs-fail-closed · concepts/runtime-policy-enforcement.
- Companion patterns: patterns/three-layer-agent-control · patterns/inference-payload-table-for-audit · patterns/policy-as-uc-function-attached-to-mcp · patterns/on-behalf-of-agent-authorization · patterns/central-proxy-choke-point · patterns/telemetry-to-lakehouse · patterns/ai-gateway-provider-abstraction · patterns/unified-billing-across-providers · patterns/budget-enforced-quota-throttle.