Databricks — Expanded interoperability with Unity Catalog Open APIs¶
Databricks Blog post (2026-05-14) announcing two coordinated milestones for Unity Catalog's Open APIs: External Access to Managed Tables in Beta (external engines like Apache Spark, Apache Flink, and DuckDB can now create and write to UC managed Delta tables via the open API surface) and Credential Vending going Generally Available for tables, with Volume Credential Vending in Public Preview for unstructured data. Tier-3 vendor-blog source. Architecturally substantive despite the launch-post framing — the post discloses (a) a catalog-mediated authentication primitive (M2M OAuth + on-demand short-lived scoped credentials replacing PATs), (b) catalog-managed commits as the coordination substrate that makes external writes safe (serialized commits prevent log corruption; same primitive is the substrate for multi-statement, multi-table transactions), and (c) Delta Kernel as the protocol-abstraction library that lets any engine integrate with UC without re-implementing the Delta protocol.
One-paragraph summary¶
Two Open API milestones land together and compose into one external-engine-as-first-class-writer story. External Access to Managed Tables (Beta) lets Apache Spark, Apache Flink, and DuckDB create, batch read/write, and stream to/from UC managed Delta tables — preserving the table's managed properties (Predictive Optimization for auto-tuned data layout, compaction, and statistics; Liquid Clustering — "up to 20× faster queries and 50% lower storage costs"). The substrate that makes this safe is catalog commits: every operation routes through UC, producing serialized commits that prevent log corruption and complete auditability, and Predictive Optimization continues to run on tables external engines write to. Catalog commits also lay the groundwork for multi-statement, multi-table transactions that require a centralized commit coordinator. Credential Vending (GA for tables, Public Preview for Volumes) is the auth half: UC issues short-lived, scoped credentials on demand to external engines, with M2M OAuth replacing personal access tokens (PATs — "per-user, long-lived, and hard to rotate") and engines auto-refreshing credentials via the credential vending API so "pipelines that run for hours complete reliably without tokens expiring mid-job." Volume Credential Vending extends the same model from tables to unstructured assets (images, PDFs, videos in UC Volumes). The post closes by flagging a forward roadmap of Attribute-Based Access Control (ABAC) for external reads — bringing the row/column-level policy primitives (canonicalised in the 2026-05-13 GA post) to external engines so governance becomes fine-grained on the external read path too. Delta Kernel is the load-bearing ecosystem mechanism — the open-source Java/Rust library for reading, writing, and committing to Delta tables that "abstracts the low-level protocol details so connector developers can focus on UC integration, not Delta implementation"; Spark, Flink, and DuckDB all leverage it. The single PepsiCo customer testimonial (Sudipta Das, Director of Enterprise Data Operations) names the shape payoff: "empowered our teams to use their preferred tools while maintaining governance and data consistency. We can leverage the benefits of managed tables within a truly interoperable data and AI platform that works across multiple compute engines."
Key takeaways¶
-
External engines can now create and write to UC managed Delta tables — the previous boundary that managed tables lived behind Databricks-internal compute is collapsed. "Now in Beta, external engines, such as Apache Spark, Flink, and DuckDB, can create and write to UC managed Delta tables with centralized governance and automatic optimizations." Three capability classes named: (a) create managed tables — stand up a new UC managed table directly from an external engine; (b) batch read and write — full transactional safety; (c) stream to and from managed tables — managed tables as both streaming source and sink for "end-to-end real-time pipelines on Apache Spark." The architectural subordination: external engines no longer fork a separate table format; they write into the same managed-table substrate Databricks's first-party engines read. Canonical instance of concepts/external-engine-write-to-managed-table.
-
Catalog commits are the safety substrate. "Because every operation flows through UC managed tables built on catalog commits, you get serialized commits that prevent log corruption and complete auditability of every read and write." The architectural primitive: every commit (insert, update, delete, schema change, snapshot promotion) is mediated by Unity Catalog rather than written directly by the writer client. This is what makes concurrent writes from heterogeneous engines (Spark + Flink + DuckDB on the same table) safe — the catalog serializes commits in a way the underlying object-store + Delta-log substrate alone cannot. Two named consequences: (a) Predictive Optimization continues to run on tables external engines write to — auto-compaction + auto-statistics stay live across the engine boundary; (b) multi-statement, multi-table transactions become possible — the post calls out that catalog commits "lay the groundwork" for the transaction modes Databricks already documents (
docs.databricks.com/aws/en/transactions/transaction-modes). Canonical instance of concepts/catalog-managed-commits. -
Credential vending replaces broad static credentials with on-demand short-lived scoped credentials. "For an external engine to access data in UC, it needs a secure way to authenticate and get scoped access to cloud storage without requiring broad, static permissions or credentials tied to a specific account. Unity Catalog handles this through credential vending… UC issues short-lived, scoped credentials to external engines on demand, with access policies enforced centrally." GA for tables, Public Preview for Volumes. The architectural shape is the short-lived-credential pattern applied at the catalog → cloud-storage boundary: the catalog mints minutes-scale credentials per request, scoped to the requested resource, governed by UC privileges. Canonical instance of concepts/credential-vending + patterns/credential-vending-for-external-engine-access.
-
M2M OAuth replaces PATs as the engine-to-UC auth primitive. "External engines can now authenticate to UC using machine-to-machine (M2M) OAuth, meeting enterprise security requirements without relying on personalized access tokens (PATs), which are per-user, long-lived, and hard to rotate." The named structural failure modes of PATs the post calls out: per-user (a service binds to a human identity, breaking when the human leaves), long-lived (compromise window is indefinite), hard to rotate (rotation requires per-engine reconfiguration). M2M OAuth dissolves all three: service-to- service identity (no human binding), short-lived token lifetime (compromise bounded), automatic re-mint (rotation becomes invisible). Canonical instance of concepts/m2m-oauth-vs-pat.
-
Engine-side auto-refresh closes the long-running-job gap. "Credentials are refreshed automatically by engines via the UC credential vending APIs, so pipelines that run for hours complete reliably without tokens expiring mid-job." This is the concrete property that makes minutes-scale credential lifetimes operationally tractable for hour-scale Spark / Flink pipelines: the engine SDK detects approaching expiry and re-invokes the vending API, transparently to the pipeline. The pattern is the engine-side counterpart to the catalog-side minting.
-
Volume credential vending extends the model from tables to unstructured data. "Volume credential vending is now in Public Preview, so external clients can request temporary, scoped credentials to access images, PDFs, and videos stored in volumes with Unity Catalog governance. The same access control model, audit trail, and scoped credentials apply whether you're querying a table or processing a raw video file externally." Architectural unification: one credential-vending primitive, two asset classes (tabular Delta tables in managed tables + unstructured artifacts in UC Volumes). The substrate the MapAid groundwater archive pipeline put on the wiki as scanned-document storage now becomes externally-accessible under the same short-lived-scoped-credential model that tables use.
-
Delta Kernel is the protocol-abstraction library that makes the ecosystem composable. "Delta Kernel — the open source Java and Rust library for reading, writing, and committing to Delta tables — abstracts the low-level protocol details so connector developers can focus on UC integration, not Delta implementation." Three named adopters: Apache Spark, Delta Flink, DuckDB. The architectural leverage: connector developers do not need to re-implement the Delta protocol (snapshot resolution, schema evolution, commit-log parsing, transactional writes) — the library handles all of it, exposing only the engine-integration surface. Canonical instance of patterns/connector-library-as-protocol-abstraction — distinct from the alternative shape (every engine implements its own Delta reader/writer, drifting in subtle protocol compatibility ways).
-
Coarse-grained external access today; ABAC for external reads is on the roadmap. "Credential vending today governs coarse-grained access controls for external engines. We've also developed functionality to enforce attribute-based access controls (ABAC) for external reads, which makes governance fine-grained. This makes it possible to enforce row and column level ABAC policies when UC managed tables are read from external engines." The 2026-05-13 GA post canonicalised ABAC policies for first-party read paths (workspace SQL, Databricks Apps via SQL Statement API). This post flags that the same row-filter / column-mask policy primitives are being extended to external reads — i.e., ABAC enforcement remains enforced when the reader is Spark / Flink / DuckDB rather than Databricks SQL. The architectural tension this resolves: external engines should not be a governance-bypass surface relative to first-party engines. Reserved for a future ingest once this ships.
-
PepsiCo testimonial frames the customer payoff in vendor-neutral terms. "With Unity Catalog's Open APIs, we've empowered our teams to use their preferred tools while maintaining governance and data consistency. We can leverage the benefits of managed tables within a truly interoperable data and AI platform that works across multiple compute engines." — Sudipta Das, Director of Enterprise Data Operations at PepsiCo. The shape: managed-table benefits (Predictive Optimization, Liquid Clustering, governance) + compute-engine choice (Spark / Flink / DuckDB / Trino / Confluent Tableflow per the ecosystem opener) without the data-duplication tax that would otherwise be the price of compute-engine choice.
-
Activation enrollment shape: feature-flagged Beta on the metastore + per-schema grant. "Enroll in 'External Access to Unity Catalog Managed Delta Table' in the Databricks preview portal… Enable external data access on your metastore and grant
EXTERNAL_USE_SCHEMAon the schema containing the tables you want to access. Create a new UC managed table. To move existing data, see the migration guide for converting external tables to managed. Use Delta-Spark 4.2 with Unity Catalog 0.4.1 to create, read, and write to managed tables from external compute." Three load-bearing rollout primitives: (a) preview-portal enrollment (account-level opt-in), (b) metastore-level external-access toggle, (c) schema-levelEXTERNAL_USE_SCHEMAgrant. Plus a named migration path for converting external tables to managed. The version pinning (Delta-Spark 4.2 + Unity Catalog 0.4.1) is the version-coordination contract for the Beta.
Architectural primitives disclosed¶
| Primitive | Capability summary | First wiki canonicalisation |
|---|---|---|
| External Access to Managed Tables (Beta) | External engines (Spark / Flink / DuckDB) can create, read, write, and stream to/from UC managed Delta tables; Predictive Optimization + Liquid Clustering continue to apply; transactional safety preserved. | systems/uc-managed-tables, concepts/external-engine-write-to-managed-table |
| Catalog commits | Every operation flows through UC; serialized commits prevent log corruption; complete read/write auditability; substrate for multi-statement, multi-table transactions. | concepts/catalog-managed-commits, patterns/catalog-managed-commits-for-external-write-safety |
| Credential Vending (GA tables, PuPr Volumes) | UC issues short-lived, scoped credentials on demand to external engines; access policies enforced centrally; engines auto-refresh via the vending API. | systems/uc-credential-vending, concepts/credential-vending, patterns/credential-vending-for-external-engine-access |
| M2M OAuth for external engines | Replaces PATs (per-user, long-lived, hard-to-rotate) with service-to-service OAuth; meets enterprise security requirements. | concepts/m2m-oauth-vs-pat |
| Delta Kernel | Open-source Java + Rust library for reading, writing, committing to Delta tables; abstracts protocol details so connector developers focus on UC integration; adopted by Spark, Delta Flink, DuckDB. | systems/delta-kernel, patterns/connector-library-as-protocol-abstraction |
| Volume credential vending (Public Preview) | Same scoped + short-lived credential model extended from tabular Delta tables to unstructured assets (images, PDFs, videos) in UC Volumes; same audit trail. | systems/uc-credential-vending, systems/unity-catalog-volumes |
| ABAC for external reads (forward roadmap) | Row + column-level ABAC policies enforced when external engines read UC managed tables — extends the 2026-05-13 GA primitives to the external-read path. | systems/unity-catalog-abac (forward face) |
Operational numbers¶
- 20× faster queries + 50% lower storage cost — the Predictive Optimization + Liquid Clustering envelope on UC managed tables (per the article opener).
- Pipelines that run for hours — the explicit duration class the auto-refresh design targets; minutes-scale token TTL but hour-scale pipeline runtime, bridged by engine-side automatic re-minting.
- Three named external engine adopters — Apache Spark, Apache Flink, DuckDB (via Delta Kernel). Plus the broader UC ecosystem opener naming Trino and Confluent Tableflow among "dozens of integrations" governing Delta + Iceberg tables, though those two are not specifically called out as managed-table writers in the Beta.
- Version contract: Delta-Spark 4.2 + Unity Catalog 0.4.1 (Beta version pinning).
Caveats / open questions¶
-
The post does not disclose the catalog-commit wire protocol. It names "every operation flows through UC managed tables built on catalog commits" and references a separate blog ("Convergence Open Table Formats and Open Catalogs: Catalog Commits Generally Available") for depth, but does not describe the on-the-wire commit-coordination protocol (preconditions, conflict detection, retry semantics, isolation level). Reserved for a future catalog-commits-internals ingest.
-
Multi-statement, multi-table transactions are flagged as "groundwork" rather than disclosed-in-detail. "Catalog commits also lay the groundwork for features like multi-statement, multi-table transactions that require a centralized commit coordinator." The post links to existing transaction-mode documentation but does not detail the commit-coordinator protocol or the isolation guarantees of multi-table transactions across heterogeneous engines.
-
Coarse-grained-only external access today. "Credential vending today governs coarse-grained access controls for external engines." The full ABAC fine-grained-governance story disclosed in the 2026-05-13 GA post applies to first-party read paths only at the time of this post; ABAC for external reads is "developed" but not yet GA. Future ingest once that ships.
-
No quantitative figures on credential-vending throughput, latency, or scaling envelope. The post asserts the architectural shape (short-lived + scoped + auto-refresh) but does not disclose how many credentials/sec UC can mint, what the p99 vending latency is, or what the per-metastore credential-vending capacity ceiling is.
-
No comparison to alternative architectures — e.g., the post does not contrast catalog-managed commits with the self-coordinating Iceberg snapshot-pointer model (where commit coordination is in the metadata file rather than externalised to a coordinator service). Architecturally, the externalised-coordinator approach is what enables multi-statement multi-table transactions; the file-mediated approach has different blast radius / single-point-of-failure trade-offs. Reserved for the future ingest comparing commit-coordination shapes.
-
The PepsiCo testimonial does not disclose deployment numbers — no table count, data volume, engine mix, or query-volume disclosure. Vendor-blog testimonial-as-canonical- instance is acceptable per AGENTS.md borderline-include guidance, but deeper ER-style customer-disclosure would strengthen the canonicalisation.
Cross-source continuity¶
This post sits in the Unity Catalog disclosure cluster with two tightly-related sibling 2026-05 sources:
-
2026-05-13 ABAC + Governed Tags GA (sources/2026-05-13-databricks-abac-row-filtering-and-column-masking-policies-governed-tags) — canonicalised UC's policy-evaluation engine face (organize → detect → protect over governed-tag attributes, with row filter + column mask + agentic classification). This 2026-05-14 post canonicalises UC's external-access-and-credential-vending face + flags ABAC-for-external-reads as the bridge that composes the two faces. The two faces compose: governance policies (2026-05-13) operating on table-storage attributes, enforceable across both first-party and external read paths (2026-05-14 forward roadmap).
-
2026-05-13 Clinical operations intelligence on the Lakehouse (sources/2026-05-13-databricks-clinical-operations-intelligence-belongs-on-the-lakehouse) — canonicalised UC's in-workspace-app substrate face (data- plane half of the single-platform application architecture reading from UC via the SQL Statement API). This 2026-05-14 post is the complement for external-engine consumers: the same UC tables that the in-workspace app reads first-party are now writeable + readable by external engines via the open API surface, with the same governance inheritance.
Cousin to 2026-04-20 Mercedes-Benz cross-cloud data mesh — which canonicalised UC + Delta Sharing for B2B / cross-cloud data sharing. The 2026-05-14 post deepens the same open-API thesis at the engine-integration altitude (Spark / Flink / DuckDB writing into managed tables) rather than the cross-cloud sharing altitude (sharing data across UC metastores).
Cousin to 2025-09-30 Expedia "prefer MERGE INTO over INSERT OVERWRITE" — Expedia named MERGE INTO as the right primitive for transactional change against open table formats. The 2026-05-14 post discloses the substrate that makes multi-table MERGE-class operations possible across heterogeneous engines: catalog-managed commits as a centralised coordination point.
Sibling to 2026-04-29 sketch functions for analytics (sources/2026-04-29-databricks-approximate-answers-exact-decisions-new-sketch-functions-for-analytics) — same vendor, same publication wave; different surface (SQL analytics primitives vs governance / external-access primitives).
Skip-cluster boundary marker: this post sits at the architecturally-substantive end of the 2026-05 Databricks publication wave (alongside the 2026-05-08 Genie Frontier post, 2026-05-08 Superhuman serving post, 2026-05-13 ABAC GA, 2026-05-13 Clinical Ops, 2026-05-13 Claroty CPS Library). The borderline test (architecture density >20%) is met because the catalog-commits + credential-vending + Delta Kernel disclosures all describe distinct architectural primitives with named substrate properties (serialised commits, log-corruption prevention, auditability, M2M OAuth, scoped/short-lived credentials, auto-refresh, protocol abstraction) — distinct from a hypothetical pure-product-launch post that would announce "now in Beta" without naming the substrate mechanisms.
Source¶
- Original: https://www.databricks.com/blog/expanded-interoperability-unity-catalog-open-apis
- Raw markdown:
raw/databricks/2026-05-14-expanded-interoperability-with-unity-catalog-open-apis-eb701254.md - Companion (catalog commits depth): Convergence of Open Table Formats and Open Catalogs: Catalog Commits Generally Available
- Companion (UC managed tables): Introducing Unity Catalog Managed Tables
- Companion (credential vending docs): Credential vending — Databricks AWS Docs
- Companion (transaction modes): Databricks transaction modes
- Delta Kernel repository: delta-io/delta — kernel
- Delta Flink: delta-io/delta — flink
Related¶
- systems/unity-catalog — the policy-evaluation, audit, and catalog-commits substrate; this post adds the Open API + external-engine-write face as the eighth canonical UC face on the wiki.
- systems/uc-managed-tables — the open-API-accessible managed Delta tables; first wiki canonicalisation.
- systems/uc-credential-vending — the credential-vending API; first wiki canonicalisation.
- systems/delta-kernel — the protocol-abstraction library; first wiki canonicalisation.
- systems/delta-lake — adds an external-engine-managed-write face (Delta tables under UC governance writeable by external engines via Delta Kernel + catalog commits).
- systems/duckdb — first wiki canonicalisation.
- systems/apache-spark — adds an external-engine-to-UC face.
- systems/apache-flink — adds an external-engine-to-UC face.
- systems/unity-catalog-volumes — Volume credential vending brings unstructured assets under the same scoped-credential model.
- systems/unity-catalog-abac — ABAC for external reads is flagged as the forward roadmap composing the 2026-05-13 GA primitives with the 2026-05-14 external-access primitives.
- concepts/credential-vending — first wiki canonicalisation of the architectural concept.
- concepts/catalog-managed-commits — first wiki canonicalisation.
- concepts/external-engine-write-to-managed-table — first wiki canonicalisation of the architectural shape.
- concepts/m2m-oauth-vs-pat — first wiki canonicalisation.
- concepts/short-lived-credential-auth — credential vending is the catalog-mediated instance of this concept.
- concepts/oauth-jwt-short-lived-credential — UC's M2M OAuth tokens fit the broader OAuth-JWT short-lived family canonicalised on the 2026-04-30 Backstage-with-Lakebase ingest.
- concepts/attribute-based-access-control — extended to the external-read path (forward roadmap).
- patterns/credential-vending-for-external-engine-access — first wiki canonicalisation of the deployment pattern.
- patterns/catalog-managed-commits-for-external-write-safety — first wiki canonicalisation.
- patterns/connector-library-as-protocol-abstraction — first wiki canonicalisation; Delta Kernel as the canonical instance.
- companies/databricks