SYSTEM Cited by 14 sources
Unity Catalog (Databricks)¶
Unity Catalog (UC) is Databricks' unified governance solution for data and AI assets. Several distinct faces show up across ingested sources:
- As Databricks' internal governance/catalog service — the stateless-service-turned-Dicer-backed sharded in-memory cache from the Dicer case study.
- As the hub in a customer-facing data mesh — the global catalog Mercedes-Benz federates AWS Iceberg tables into and shares via Delta Sharing to Azure consumers.
- As the audit / telemetry substrate for Unity AI Gateway — coding-agent + MCP audit logs plus OpenTelemetry-sourced metrics/traces from all governed AI traffic land in UC-managed Delta tables, making AI telemetry a first-class Lakehouse dataset joinable with business data. See sources/2026-04-17-databricks-governing-coding-agent-sprawl-with-unity-ai-gateway + patterns/telemetry-to-lakehouse + systems/unity-ai-gateway.
- As the policy-evaluation engine for ABAC + governed tags + agentic data classification — the organize → detect → protect pipeline that reached GA on 2026-05-13. UC is not just where governance is recorded but where it is expressed, evaluated, and enforced through ABAC policies, governed tags, and data classification. See sources/2026-05-13-databricks-abac-row-filtering-and-column-masking-policies-governed-tags
- patterns/tag-driven-attribute-based-access-control.
- As the substrate the in-workspace app reads from via the SQL Statement API — the data-plane half of the single-platform application architecture. The Databricks App authenticates as a workspace service principal and queries UC tables directly; the app "inherits Unity Catalog access controls without any additional configuration." See sources/2026-05-13-databricks-clinical-operations-intelligence-belongs-on-the-lakehouse
- patterns/in-workspace-app-as-decision-support + patterns/shap-attribution-as-governed-delta-table (the companion ML-audit pattern that lives in UC governed Delta tables).
- As a fully Iceberg-native catalog (2026-05-28) — the "organize → detect → protect" pipeline that previously read Delta-natively now extends to Iceberg as a first-class write format and to Iceberg-compatible engines as first-class consumers. Five concurrent surface-area expansions in the 2026-05-28 announcement: (a) Iceberg v3 reaches GA across managed / foreign / UniForm-enabled tables — deletion vectors, row tracking, VARIANT type all spec-level on the Iceberg side now, cross-format compatible with Delta. (b) Managed Iceberg (GA) — UC creates / reads / writes / governs Iceberg tables directly with Predictive Optimization and Liquid Clustering applying. (c) **Foreign Iceberg (GA)
- Credential Vending for Foreign Iceberg (GA) — UC governs
Iceberg tables managed in eight named external catalogs (AWS Glue,
Snowflake Horizon, Hive Metastore, Apache Polaris, Salesforce
Data Cloud, Google Cloud Lakehouse, Palantir, Workday) while
leaving data and source catalog in place; mints short-lived
scoped credentials for federated access.
(d) External Sharing to Iceberg clients (GA) —
Delta Sharing now emits Iceberg REST
endpoints; recipients on Snowflake / Trino / Flink / Spark
consume shared data via Iceberg-compatible clients without
ingestion or copies. Public Preview also adds External Sharing
of Foreign Iceberg tables.
(e) Cross-engine ABAC (Beta)
via the Iceberg REST
Catalog Scan Planning API (Iceberg 1.11). UC evaluates ABAC
policies during server-side scan planning; the engine receives a
filtered scan plan; "any engine, such as Apache Spark or DuckDB,
which implements the Iceberg REST catalog scan planning client …
can access data with ABAC enforced." Canonical wiki instance of
scan-planning
as policy enforcement point — extends the
Databricks-compute-only UC ABAC
surface beyond the engine boundary.
Plus Iceberg-compatible materialized views (Gated Public
Preview) — managed MVs exposed as native Iceberg tables; syntax
CREATE MATERIALIZED VIEW my_mv USING ICEBERG. And forward-looking Iceberg v4 + Delta 5.0 adaptive metadata tree** alignment — concepts/format-co-evolution-iceberg-delta. See sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governance.
Internal architecture (Dicer case study)¶
From the Dicer post, UC is the headline case study for Dicer's auto-sharder: originally a stateless service, extremely high read volume drove prohibitive latency because every request hit the backend database.
Why remote caching was rejected¶
- Cache must be incrementally updated and snapshot-consistent with storage.
- Customer catalogs can be gigabytes — partial or replicated snapshots in a remote cache would introduce substantial overhead.
Dicer integration outcome¶
- Sharded in-memory stateful cache across pods, assignment maintained by systems/dicer.
- Remote network calls replaced by local method calls within the owning pod.
- Cache hit rate: 90–95 % (Source: sources/2026-01-13-databricks-open-sourcing-dicer-auto-sharder).
- Drastic reduction in DB round-trips; DB load drops substantially.
Unity Catalog is thus a concrete example of concepts/dynamic-sharding beating both the stateless + remote-cache option and static-sharding as an architecture for a catalog-service read path.
Customer-facing role (Mercedes-Benz data mesh)¶
From the Mercedes-Benz case study, UC plays the global catalog role in a cross-cloud concepts/data-mesh:
- Centralises metadata and access-control across metastores, regions, and hyperscalers — one governance plane for AWS + Azure. This is the architectural realisation of concepts/hub-and-spoke-governance.
- Federates Iceberg tables from AWS Glue — registering them in UC so they can participate in systems/delta-sharing without the producer rewriting into Delta first. Format translation happens at the federation boundary.
- Speaks systems/delta-sharing — the open exchange protocol between UC metastores (cross-region, cross-cloud) and with external partners (suppliers).
The self-service orchestration layer over UC on this mesh is systems/ddx-orchestrator, which automates permission management and Sync-Job lifecycle so domain teams don't operate shares by hand.
(Source: sources/2026-04-20-databricks-mercedes-benz-cross-cloud-data-mesh)
Governance / ABAC / classification face (2026-05-13 GA disclosure)¶
The 2026-05-13 GA announcement disclosed Unity Catalog's organize → detect → protect governance pipeline as three co-designed primitives all hosted inside the catalog:
| Primitive | Capability |
|---|---|
| Governed Tags | Account-level tag vocabulary; attached to catalogs / schemas / tables / columns; inherits parent → child; separate CREATE / MANAGE permissions distinct from data ownership; full SQL DDL + REST + UI + Terraform lifecycle. |
| ABAC policies | Evaluate tag-based conditions to apply row filters + column masks to all matching objects across catalogs/schemas. 10K+ per metastore, 100+ per catalog/schema at GA (10× growth). Session identity evaluation for views/functions. Single VARIANT UDF can mask many column types at once. |
| Data Classification | Continuous scan; built-in classifiers cover GDPR / HIPAA / GLBA / DPDPA / PCI + UK / Germany / Australia / Brazil regional packs (India + Canada coming this month); custom classifiers in Beta learn detection patterns from already-tagged columns; human-in-the-loop FP exclusion improves precision; consolidated dashboard surfaces detected classifications + access + protection coverage. |
The load-bearing engineering claim is "there is no handoff between systems, and no manual step between discovery and protection" — all three primitives operate within Unity Catalog's permission + metadata model. This unlocks separation-of-duties governance across three role groups: governance team (taxonomy + policies), stewards (tag application + FP exclusion), data producers (tables in governed scope).
The architectural shift is from per-object configuration of row filters / column masks ("repetitive and prone to inconsistency") to declarative policy evaluation against governed-tag attributes — one policy covers many tables, and new data picks up protection automatically as tags propagate (whether human-applied or classifier-applied).
(Source: sources/2026-05-13-databricks-abac-row-filtering-and-column-masking-policies-governed-tags)
In-workspace app substrate face (2026-05-13 clinical-ops disclosure)¶
The 2026-05-13 Clinical operations intelligence belongs on the Lakehouse post surfaces UC as the data-plane half of the single-platform application architecture. The face: the app reads from UC directly via the SQL Statement API, as opposed to the conventional shape (sync-pipeline → operational-DB → web-tier).
Three load-bearing properties:
-
Service-principal-mediated access. "The app authenticates as a first-class workspace service principal, queries Unity Catalog via the SQL Statement API… all on internal connections." UC inherits the app's identity end-to-end; no separate JDBC credential surface.
-
Inheritance of access controls — for free. "The app inherits Unity Catalog access controls without any additional configuration." UC's ABAC policies / governed tags / classification compose onto the app without an app-side RBAC translation layer. Per-user PHI / regulated-data handling rides on the catalog's "HIPAA Safe Harbor / Expert Determination posture configured at the catalog or schema level."
-
ML-audit substrate via governed SHAP-attribution Delta tables. UC is the substrate for the SHAP-attribution-as-governed-Delta-table pattern: predictions plus per-prediction SHAP feature contributions land in UC-governed Delta tables, lineaged through UC, versioned in systems/mlflow. Population-level fairness audit ("are community sites systematically under-weighted?") becomes a SQL query against UC; per-recommendation explanation for end users is read by the same in-workspace app via the same SQL Statement API.
The architectural shift this face surfaces: UC is no longer just the data and AI governance plane; it's also the application-tier data substrate for workspace-resident apps composed via Databricks Apps. Reference implementation: systems/site-feasibility-workbench. Forward roadmap: "Patient Cohort and Recruitment, Enrollment Velocity Optimizer, Risk-Based Monitoring and Compliance — All four deploy as Databricks Apps. All four query Unity Catalog directly. None make external API calls."
(Source: sources/2026-05-13-databricks-clinical-operations-intelligence-belongs-on-the-lakehouse)
Open API + external-engine write face (2026-05-14 disclosure)¶
The 2026-05-14 Expanded interoperability with Unity Catalog Open APIs post discloses UC's open-API + external-engine-write face — UC as the commit-coordination + credential-vending hub that makes managed Delta tables writeable by external engines (Apache Spark, Apache Flink, DuckDB) while retaining all the optimisation and governance benefits of vendor-managed tables.
Two coordinated GA / Beta milestones land together:
| Surface | Status | Substrate |
|---|---|---|
| External Access to Managed Tables | Beta | systems/uc-managed-tables + systems/delta-kernel + concepts/catalog-managed-commits |
| Credential Vending for tables | Generally Available | systems/uc-credential-vending |
| Credential Vending for Volumes | Public Preview | Same primitive extended to unstructured assets |
| ABAC for external reads | Forward roadmap | systems/unity-catalog-abac applied at the external-read path |
Three architectural primitives this face surfaces:
-
UC as commit coordinator. "Because every operation flows through UC managed tables built on catalog commits, you get serialized commits that prevent log corruption and complete auditability of every read and write." External engines (Spark / Flink / DuckDB) hand commits to UC instead of writing directly to the Delta log; UC serializes commits, validates the protocol, audits, and triggers Predictive Optimization. Catalog commits also "lay the groundwork for features like multi-statement, multi-table transactions that require a centralized commit coordinator." Canonical instance of concepts/catalog-managed-commits + patterns/catalog-managed-commits-for-external-write-safety.
-
UC as credential-vending authority. "Unity Catalog handles this through credential vending… UC issues short-lived, scoped credentials to external engines on demand, with access policies enforced centrally." Two GA-readiness additions: M2M OAuth replaces personal access tokens ("per-user, long-lived, and hard to rotate"), and engines auto-refresh credentials via the vending API so "pipelines that run for hours complete reliably without tokens expiring mid-job." Canonical instance of concepts/credential-vending + patterns/credential-vending-for-external-engine-access.
-
UC as the catalog the Delta Kernel ecosystem integrates against. Spark, Flink, and DuckDB all leverage Delta Kernel — the open-source Java + Rust library — for the Delta protocol + UC catalog handshake, so connectors "focus on UC integration, not Delta implementation." Canonical instance of patterns/connector-library-as-protocol-abstraction.
The architectural shift this face surfaces: UC is no longer just the catalog where Databricks-internal compute reads/writes; it is also the commit-coordination + auth-vending substrate for the broader Delta open-table ecosystem. Predictive Optimization continues to apply to tables external engines write — "Predictive Optimization continues to run seamlessly, even on tables accessed by external engines" — so the "managed table benefits" stay live across the engine boundary.
Customer testimonial (PepsiCo, Sudipta Das): "empowered our teams to use their preferred tools while maintaining governance and data consistency. We can leverage the benefits of managed tables within a truly interoperable data and AI platform that works across multiple compute engines."
Activation contract (Beta): three rollout primitives — preview-
portal account-level enrollment, metastore-level external-access
toggle, and schema-level EXTERNAL_USE_SCHEMA grant. Plus a named
migration path "for converting external tables to managed."
Version pinning: Delta-Spark 4.2 + Unity Catalog 0.4.1.
(Source: sources/2026-05-14-databricks-expanded-interoperability-with-unity-catalog-open-apis)
AI-asset-governance face (2026-05-20 disclosure)¶
The 2026-05-20 Governing AI agents at scale with Unity Catalog post (sources/2026-05-20-databricks-governing-ai-agents-at-scale-with-unity-catalog) extends Unity Catalog from data + AI assets to every asset an AI system touches — "LLMs, MCP servers, skills, and agents" — and positions UC as the policy substrate for Unity AI Gateway's enforcement fabric. Explicit framing:
"Unity Catalog has governed enterprise data since 2021 through a single permissions model, unified lineage, and a consistent audit trail across every asset. We're now extending that same governance infrastructure to cover every asset an AI system touches: LLMs, MCP servers, skills, and agents." — Source.
What's new in this face¶
| Surface | What UC stores | Mechanism |
|---|---|---|
| MCP servers as securables | Permissions, credentials, audit logs for external MCP servers (GitHub, Jira, Slack) registered in UC | Same securable model as tables / functions / volumes |
| Service Policies as UC functions | Pre-execution per-tool-call admission policies attached to MCP servers; ternary allow/deny/consent verdicts |
UC functions, evaluated by UC's existing function-evaluation engine (patterns/policy-as-uc-function-attached-to-mcp) |
| AI gateway audit logs in UC | Every model call's principal + caller identity + agent identity + access operation + timestamp, joinable with Inference Tables | Same audit-log substrate as data-access logs |
| OBO-driven row-level access for agents | "Identity flows end to end, from the user who asks the question to the specific table row the agent retrieves" — agents inherit invoking user's UC permissions in real time | UC's existing ABAC + row-filter / column-mask + classification machinery applies to agent-issued queries unchanged |
| Data classification feeds AI access control automatically | "Masked columns remain masked regardless of which agent or framework requests them" — the classification → tag → ABAC pipeline doesn't change for agent callers | systems/unity-catalog-data-classification tags drive ABAC; OBO carries identity; agent inherits the same masking the user would |
The structural unification: UC is now one substrate for data-asset governance + AI-asset governance, with one permission model spanning both. The 2026-05-13 GA disclosure framed UC as where governance is "expressed, evaluated, and enforced" for data; the 2026-05-20 post extends that property to AI assets.
Architectural relationship to Unity AI Gateway¶
agent traffic
│
▼
[Unity AI Gateway](<./unity-ai-gateway.md>) ← enforcement fabric
│
│ pre-execution policy lookup, audit-log writes
▼
[Unity Catalog](<./unity-catalog.md>) ← policy + audit substrate
│
├── data permissions (existing)
├── ABAC + row-filters + column-masks (2026-05-13 GA)
├── data classification tags (2026-05-13 GA)
├── MCP server registrations + Service Policies (2026-05-20)
└── Inference Tables + AI gateway audit logs (2026-05-20)
UC is what makes the governance-travels-with-resources property possible — the policy lives in UC, attached to the resource (table, MCP, model endpoint), so any agent calling through the gateway gets the same policy regardless of framework. Canonical wiki instance of the catalog-as-AI-policy-substrate thesis.
(Source: sources/2026-05-20-databricks-governing-ai-agents-at-scale-with-unity-catalog)
BI-serving / semantic-layer face (2026-05-27 disclosure)¶
The 2026-05-27 BI Serving Pointers source canonicalises a distinct face of UC: the catalog hosts the headless-BI semantic layer via Metric Views. Three load-bearing properties:
- Metric definitions live in UC. A Metric View is a UC object
with measure logic, dimensional join graph, and semantic
metadata (
display_name/comment/synonyms). Defined once; every consumer resolves the same definition. - AI grounding is metadata, not prompt scaffolding.
"Fields like
display_name,comment, andsynonymsgive AI systems the context they need to interpret business questions correctly. […] No custom prompts, no separate glossary." This is the schema-level prompt-engineering shape — UC metadata is the substrate Genie reasons over to map natural language to the right measure/dimension. - Predictive Optimization on managed tables is the BI-serving physical-layer foundation. "Unity Catalog managed tables are the foundation for everything else in this stack […] Predictive Optimization (covered below) is enabled by default. Automatic liquid clustering selects clustering keys that adapt as query patterns change. Metadata caching is always on."
The BI-serving face composes with the rest of UC's faces: same catalog hosts metric definitions + governance policies + MCP server registrations + inference tables — one governance plane across analytical, AI-agent, and operational workloads.
The open-standard provenance: SPARK-54119 (Apache Spark implementation of Metric Views) + UC OSS support coming.
(Source: sources/2026-05-27-databricks-bi-serving-pointers-maximizing-for-performance-and-tco)
Seen in¶
-
sources/2026-05-27-databricks-bi-serving-pointers-maximizing-for-performance-and-tco — Twelfth canonical UC face — BI-serving / semantic-layer substrate. UC hosts Metric Views (define-once metric definitions with
display_name/comment/synonymssemantic metadata) — the "headless BI layer where you define your data model and KPIs once, independent of any specific BI tool". Four named consumers (AI/BI Dashboards, Genie, SQL notebooks, third-party BI tools) all resolveMEASURE()calls against the same UC-resident metric definition. Names UC managed tables as the BI-serving physical foundation — "Unity Catalog managed tables are the foundation for everything else in this stack" — with Predictive Optimization default-on, automatic liquid clustering (CLUSTER BY AUTO), and always-on metadata caching as load-bearing managed-table-only properties. Open-standard provenance: SPARK-54119 (Apache Spark Metric Views OSS implementation) + UC OSS support coming. Reserved for future ingests: metric-resolution wire protocol, query-rewriter coverage envelope, materialization freshness contract under high ingest, multi-tenant isolation under shared materializations. -
sources/2026-05-22-databricks-how-world-bank-group-uses-databricks-to-eradicate-poverty-through-shared-knowledge — UC as the unification primitive for structured + unstructured in a multi-domain knowledge platform. World Bank Group's Knowledge 360 / Data 360 build positions Unity Catalog as "a single unified interface where we could govern our data" (Suresh Kaudi). Substrate role: governs both the structured-data tables that feed each per-domain Genie instance and the unstructured documents indexed into the RAG corpus via UC Volumes + Vector Search. The "governance parity across structured and unstructured" property is what makes the agentic-router fan-out (patterns/intent-domain-decomposer-agentic-router) operationally tractable — the user's identity propagates uniformly across per-domain Genie sub-agents and the RAG agent. Caveat: name-drop altitude only — no per-domain UC schema disclosed, no row-level-security inheritance mechanism, no per-Genie UC permission boundary detail.
-
sources/2026-05-20-databricks-governing-ai-agents-at-scale-with-unity-catalog — Eleventh canonical UC face — AI-asset-governance substrate. Extends UC from data-asset governance to also cover "LLMs, MCP servers, skills, and agents". First disclosure of Service Policies as UC functions attached to registered MCPs; first explicit end-to-end OBO disclosure ("identity flows… from the user who asks the question to the specific table row the agent retrieves"); first canonicalisation of UC as policy substrate for Unity AI Gateway's enforcement fabric. Repositioned in the four-pillar framing as the substrate for Pillars 1-3 (delegated access + data-centric governance + cost intelligence) and the durability layer for Pillar 4 (open + interoperable).
-
sources/2026-05-15-databricks-backstage-with-lakebase-part-2 — Tenth canonical UC face — operational-DB governance substrate via Lakehouse Federation. UC reaches over the operational Lakebase Postgres database as a foreign catalog (
lakebase_bs) via Lakehouse Federation. Standard UC GRANTs replace Postgres native grants; every Lakebase control-plane action lands insystem.access.audit; UC system billing tables capture branch-level cost attribution by(project_id, branch_id, endpoint_id); UC attribute-level masking policies propagate to every Lakebase branch at creation time. The "compliance side-channel" (CloudTrail + pgaudit + CloudWatch cross-reference) collapses into one SQL query against UC system tables. Canonical instance of concepts/operational-analytical-governance-unification + patterns/foreign-catalog-federation-for-operational-db-governance. "We didn't just change where the data lived; we changed where the access policy lived." -
sources/2026-05-14-databricks-expanded-interoperability-with-unity-catalog-open-apis — Open API + external-engine-write face (ninth canonical UC face on the wiki, alongside Dicer-internal-catalog / Mercedes-Benz data-mesh-hub / Unity-AI-Gateway-audit / multimodal-healthcare-governance / Zalando-Partner-Tech-external- sharing / 2026-05-13 ABAC + governed tags + classification GA / in-workspace-app substrate / Claroty CPS Library ER catalog governance). UC named as the commit-coordination + credential- vending hub that makes managed Delta tables writeable by external engines (Apache Spark, Apache Flink, DuckDB via Delta Kernel). Three load-bearing architectural primitives disclosed: catalog-managed commits (serialized commits prevent log corruption, complete auditability, multi-table transaction substrate), credential vending (M2M OAuth + auto-refresh + scoped/short-lived credentials, GA for tables + Public Preview for Volumes), and the Delta Kernel ecosystem composition (one library, multiple engines). Predictive Optimization is engine-boundary-transparent — "continues to run seamlessly, even on tables accessed by external engines." PepsiCo testimonial frames the shape payoff. Activation contract disclosed: preview-portal enrollment + metastore-level toggle + schema-level
EXTERNAL_USE_SCHEMAgrant + Delta-Spark 4.2 / UC 0.4.1 version pinning. Forward-roadmap composition with ABAC for external reads. -
sources/2026-05-13-databricks-the-rosetta-stone-of-cps-clarotys-ai-powered-library — Entity-Resolution catalog governance face (eighth UC face on the wiki). Tier-3 Databricks co-marketing post on Claroty's AI-Powered CPS Library — UC named as "the governed data foundation needed to unify these diverse datasets" across proprietary OT protocols, API calls, vendor PDFs, and unstructured documentation, while Spark-powered pipelines normalise at scale. The load-bearing role for UC in this composition is the audit- chain anchor: "every asset record is traceable back to its original raw artifact and the specific mapping version that classified it, ensuring full auditability in even the most sensitive industrial environments." UC governs the medallion tables driven by Delta CDF, the mapping- registry versions, and the Lakeflow Jobs pipelines + Model Serving endpoints + MLflow eval traces — the catalog is the one substrate that connects raw evidence → canonical CPS-ID → vulnerability attribution. Composes with systems/lakebase (transactional asset-mapping store with strict constraints) and systems/databricks-apps (SME human- in-the-loop UI) into the orchestrated multi-agent ER shape. Catalog scale at first wiki disclosure: 17 million+ assets.
-
sources/2026-05-13-databricks-clinical-operations-intelligence-belongs-on-the-lakehouse — In-workspace app substrate face (seventh UC face on the wiki, alongside Dicer-internal-catalog / Mercedes-Benz data-mesh-hub / Unity-AI-Gateway-audit / multimodal-healthcare-governance / Zalando-Partner-Tech-external-sharing / 2026-05-13 ABAC + governed tags + classification GA). UC named as the data-plane half of the single-platform application architecture: the Databricks App authenticates as a workspace service principal and queries UC tables directly via the SQL Statement API, with the app "inherit[ing] Unity Catalog access controls without any additional configuration." Three load-bearing properties — service-principal-mediated access, free RBAC inheritance, and ML-audit substrate via governed SHAP- attribution Delta tables (canonical instance of concepts/governed-shap-attribution-table). Reference implementation: systems/site-feasibility-workbench (FastAPI + React, ~30 min deployment time). Composes with systems/lakebase (operational app state) + AI/BI Genie (embedded NL query via workspace REST API) + systems/mlflow (model + attribution versioning) into one workspace-resident composition.
-
sources/2026-05-13-databricks-abac-row-filtering-and-column-masking-policies-governed-tags — Governance / ABAC / classification face (sixth UC face on the wiki, alongside Dicer-internal-catalog / Mercedes-Benz data-mesh-hub / Unity-AI-Gateway-audit / multimodal-healthcare- governance / Zalando-Partner-Tech-external-sharing). The 2026-05-13 GA announcement disclosed the organize → detect → protect pipeline as three co-designed primitives — Governed Tags (account- level tag vocabulary with inheritance + permission separation), ABAC policies (tag-driven row filter + column mask, 10K+ policies per metastore at GA, session identity evaluation for views/functions, single VARIANT UDF for multi-type masking), Data Classification (agentic LLM + pattern-recognition + UC- metadata classifier, GDPR/HIPAA/GLBA/DPDPA/PCI + UK/DE/AU/BR regional packs + custom classifiers in Beta learning from already- tagged columns) — all hosted inside UC's permission + metadata substrate so "there is no handoff between systems, and no manual step between discovery and protection." Canonical wiki instance of patterns/tag-driven-attribute-based-access-control and separation-of- duties data governance — three role groups (governance team, stewards, data producers) operating on three permission axes (
MANAGE/CREATEtaxonomy,APPLYtag,OWNERdata) without cross-team blocking. Customer testimonials emphasise the shape payoff: Atlassian (Gerald Nakhle) on operational-overhead reduction, Udemy (Rajit Saha) on "fewer policies, lower costs, surgical precision". -
— Governance plane for Zalando Partner Tech's external B2B partner-data-sharing platform (fifth UC face on the wiki; first B2B-external-exchange face, distinct from the Mercedes-Benz cross-cloud-mesh face). Unity Catalog's role in the deployment: (a) cataloguing the 200+ Delta tables (up to 200TB each) Partner Tech prepares for sharing; (b) hosting Shares (logical containers grouping related tables for a partner-dataset-group) — see patterns/recipient-per-partner-share-per-dataset-group; (c) hosting Recipients (digital identity per partner) plus the activation-URL primitive for credential delivery — see concepts/activation-link-credential-bootstrap; (d) providing the grant (Share → Recipient) as the access- control primitive; (e) producing the audit trail for every Delta Sharing read. Quote: "The managed solution provided robust governance through Unity Catalog integration, built-in security features, comprehensive audit logging". UC is the load-bearing reason Zalando chose Databricks' managed Delta Sharing service over self-hosting the open-source reference server — self-hosting would have required building catalog + recipient + audit infrastructure from scratch. Canonical instance of UC's Share/Recipient/Grant primitive set for external sharing, distinct from the internal data-mesh-federation framing. Introduces systems/zalando-partner-data-sharing-platform as the tenant on top of Unity Catalog.
- sources/2026-04-22-databricks-multimodal-data-integration-production-architectures-for-healthcare-ai — Governance substrate for the lakehouse-as-multimodal-data- integration pattern (fourth UC face on the wiki, alongside internal catalog / Mercedes-Benz data mesh / Unity AI Gateway audit). Explicit governance contract for patterns/governed-delta-tables-per-modality: "data classification with governed tags: PHI / PII / 28 CFR Part 202 / StudyID / …; fine-grained access controls (catalog / schema / table / volume permissions, plus row / column-level controls where needed for PHI); auditability (who accessed what, when); lineage (trace features and model inputs back to source datasets); controlled sharing (consistent policy boundaries across teams and tools)." One governance surface across genomics (via Glow), imaging embeddings (via Mosaic AI Vector Search), clinical-notes NLP entities, and wearables streams (via Lakeflow SDP) — the named remedy to the specialty-store-per-modality anti- pattern. Reproducibility co-requirement called out explicitly: time-travel for datasets, CI/CD for jobs, MLflow for experiment + model version tracking.
- sources/2026-01-06-redpanda-build-a-real-time-lakehouse-architecture-with-redpanda-and-databricks — Streaming-broker catalog consumer: Redpanda Iceberg Topics integrate directly with Unity Catalog via the Iceberg REST API. Redpanda owns full table lifecycle against Unity (create, register snapshots, manage schema updates, delete tables). Four verbatim UC responsibilities disclosed in the joint-vendor framing: "Fine-grained access control. Consistent security across workloads. Metadata management and lineage. Easy discovery for downstream users." Unity Catalog is framed as the "centralized, governed location for managing the table view" of streaming data; the broker-native integration is the mechanism behind the "stream is the table" slogan from the Databricks side of the partnership. No mechanism depth beyond the REST-API naming.
- sources/2026-01-13-databricks-open-sourcing-dicer-auto-sharder — UC's Dicer-backed in-memory-cache rollout; 90–95 % hit rate.
- sources/2026-04-20-databricks-mercedes-benz-cross-cloud-data-mesh — UC as global catalog + federation layer in Mercedes-Benz's AWS↔Azure data mesh; Iceberg-on-Glue federation, Delta-Sharing exchange.
- sources/2026-05-11-databricks-unlocking-the-archives — UC Volumes face: governs the raw scanned PDFs / TIFFs / JPGs and per-page rendered images for the MapAid groundwater archive pipeline. "Each document's pages are rendered as images and stored in Unity Catalog Volumes, creating a clean, versioned foundational dataset." Versioned non-tabular data in the same governance surface as the Delta tables holding classification + extracted-record output. See systems/unity-catalog-volumes.
Related¶
- systems/dicer
- systems/delta-sharing
- systems/delta-lake
- systems/apache-iceberg
- systems/zalando-partner-data-sharing-platform — UC tenant for external B2B partner data sharing.
- concepts/dynamic-sharding
- concepts/stateless-compute
- concepts/data-mesh
- concepts/hub-and-spoke-governance
- concepts/zero-copy-data-sharing-protocol — load-bearing property of UC + Delta Sharing for external sharing.
- concepts/activation-link-credential-bootstrap — Recipient onboarding primitive.
- patterns/recipient-per-partner-share-per-dataset-group — UC's Share/Recipient/Grant primitive exposed as a deployment pattern.
- systems/unity-catalog-abac — UC's ABAC policy primitive (GA 2026-05-13).
- systems/unity-catalog-governed-tags — UC's account-level governed tag taxonomy (GA 2026-05-13).
- systems/unity-catalog-data-classification — UC's agentic data-classification engine (GA 2026-05-13).
- concepts/attribute-based-access-control — table-storage- governance instance of ABAC.
- concepts/governed-tag — the attribute substrate.
- concepts/agentic-data-classification — automated LLM-based PII/PHI detection.
- concepts/separation-of-duties-data-governance — three-role governance shape UC's permission split enables.
- concepts/session-identity-evaluation — view-as-bypass fix in ABAC.
- patterns/tag-driven-attribute-based-access-control — end-to- end organize → detect → protect pattern.
- patterns/single-variant-udf-for-multi-type-masking — type- erasure via VARIANT UDF for column masking.
- systems/uc-managed-tables — managed Delta tables under UC (Beta external-write 2026-05-14).
- systems/uc-credential-vending — credential-vending API (GA for tables, Public Preview for Volumes, 2026-05-14).
- systems/delta-kernel — open-source Java + Rust library external engines integrate via.
- concepts/catalog-managed-commits — central commit coordinator substrate.
- concepts/credential-vending — short-lived scoped credentials on demand.
- concepts/external-engine-write-to-managed-table — architectural shape UC instantiates.
- concepts/m2m-oauth-vs-pat — auth-substrate choice for external engines.
- patterns/credential-vending-for-external-engine-access — auth-side deployment pattern.
- patterns/catalog-managed-commits-for-external-write-safety — commit-side deployment pattern.
- patterns/connector-library-as-protocol-abstraction — Delta Kernel pattern.