Skip to content

SYSTEM Cited by 3 sources

Unity AI Gateway (Databricks)

Unity AI Gateway is Databricks' productised instance of the AI-gateway provider abstraction pattern, specialised to coding agents + MCP integrations rather than just application LLM calls. Its job is to be the single governance + cost + telemetry plane for all coding-tool traffic in a Databricks customer's fleet.

Generalisation to org-wide agent populations (2026-05-20 disclosure)

The 2026-05-20 Governing AI agents at scale with Unity Catalog post (sources/2026-05-20-databricks-governing-ai-agents-at-scale-with-unity-catalog) generalises Unity AI Gateway from coding-agent scope to org-wide agent scope and adds three named architectural extensions that this page didn't previously canonicalise.

Scope generalisation

The 2026-04-17 launch positioned the Gateway around coding-agent sprawl (Cursor / Codex / Claude Code / Gemini CLI). The 2026-05-20 post generalises to every department's agents: dev (coding agents), analytics (forecasting agents), sales-ops (lead-scoring agents), support (ticket-routing agents), marketing (personalization), finance (reconciliation). The architectural surface didn't change — the "every model call, every tool invocation, every agent interaction flows through the gateway" principle now covers all of them.

Three new feature surfaces

The Gateway as disclosed on 2026-04-17 had centralised audit + cost + observability. The 2026-05-20 post discloses three additional named layers attached to the same proxy:

Layer What it does Wiki entity
Service Policies Pre-execution per-tool-call evaluation; UC functions attached to registered MCPs; returns allow/deny/consent; fail-closed on deny systems/uc-service-policies
Guardrails Inline content scanning of every model call — inputs (PII, jailbreak), outputs (hallucinations, sensitive content); fail-closed systems/unity-ai-gateway-guardrails
Inference Tables Full payload of every model call (exact prompt + exact response + tokens + latency) written to UC-managed Delta tables; customer-controlled retention systems/inference-tables
Budgets Per-user / per-group monthly spend thresholds with alerts; hard enforcement on roadmap systems/unity-ai-gateway-budgets

Four-pillar repositioning

The 2026-05-20 post repositions the Gateway in the four-pillar framing (concepts/four-pillars-of-agent-governance):

  • Pillar 1 (Delegated access) — three-layer composition: OBO permissions + Service Policies + Guardrails. The Gateway is the enforcement fabric where all three layers run.
  • Pillar 2 (Data-centric AI governance) — Gateway writes Inference Tables + UC audit logs to the lakehouse, joinable with business data; substrate for Lakewatch (agentic SIEM).
  • Pillar 3 (Cost intelligence) — usage-tracking + Budgets.
  • Pillar 4 (Open and interoperable) — single governed endpoint across Databricks-hosted models + Azure OpenAI + AWS Bedrock + Anthropic; framework-agnostic across LangGraph / CrewAI / OpenAI SDK / Anthropic SDK / AutoGen / LlamaIndex.

Identity propagation (now explicit)

The 2026-05-20 post is the first to explicitly disclose OBO as the data-access mechanism: "identity flows end to end, from the user who asks the question to the specific table row the agent retrieves." The Gateway is the identity-translation point — agents inherit the invoking user's UC permissions in real time via on-behalf-of token passing, not via shared service accounts.

Three-pillar architecture (from the 2026-04-17 launch post)

  1. Centralised security and audit.
  2. Every agent data-access flow logged in Unity Catalog (same governance substrate as Lakehouse data + ML assets).
  3. All tracing in MLflow (specifically MLflow 3 GenAI tracing — named for Claude Code integration).
  4. MCP servers "managed in Databricks" — the gateway is the policy point for MCP traffic, not just LLM traffic.
  5. Single-identity plane: developers authenticate once with Databricks credentials for all tools (GitHub, Atlassian, etc.), "no separate logins per service".
  6. Single bill and cost limits.
  7. Foundation Model API provides first-party inference for OpenAI, Anthropic, Gemini, and open models like Qwen.
  8. Admins can also "bring external capacity in", extending governance "to all your tokens, regardless of where they flow"patterns/unified-billing-across-providers.
  9. Gateway-enforced budgets are per-developer, not per-tool — admins give each developer one budget and the developer burns it on whichever tool of choice (Cursor / Codex / Gemini CLI / Claude Code / …).
  10. Full observability in the Lakehouse.
  11. Coding-tool metrics + traces land in Unity-Catalog-managed Delta tables via OpenTelemetry ingestion.
  12. Joinable with other Lakehouse datasets (Workday for adoption-by-org / region / seniority; PR-cycle data for velocity quantification) — patterns/telemetry-to-lakehouse.
  13. Surfaces rate-limit hits as a proactive capacity-planning signal.

Supported clients (at launch)

Relation to existing wiki entities

What the post does not disclose

  • Gateway internals: routing, fallback, rate-limiter algorithm, streaming handling, per-provider adapter shape.
  • MCP-governance mechanics: how the gateway inspects MCP traffic, auth flow from coding-tool → gateway → MCP → data source.
  • Telemetry schema landing in Delta tables.
  • Latency / throughput / cost-per-token / adoption numbers.

Tier-3 Databricks post — ingested because the problem framing (coding-agent sprawl) and three-pillar architecture are substantive, not because the internals are disclosed.

Seen in

Last updated · 542 distilled / 1,571 read