Skip to content

SYSTEM Cited by 2 sources

UC Managed Tables

Unity Catalog (UC) Managed Tables are advanced Delta tables in which Unity Catalog owns the table's storage layout, optimisation, and commit coordination — as distinct from external tables, where the customer owns the storage path and Delta-log discipline. The defining distinction: managed tables get automatic data layout tuning, compaction, vacuuming, and statistics maintenance via Predictive Optimization plus Liquid Clustering, while still being accessible through open APIs by non-Databricks compute engines.

Capability disclosure (2026-05-14)

The 2026-05-14 Expanded interoperability with Unity Catalog Open APIs post puts the load-bearing managed-table properties on the wiki:

  • Performance envelope: "up to 20× faster queries and 50% lower storage costs" relative to unoptimised Delta tables on the same data — the gain attributed to Predictive Optimization (auto-tuned data layout + auto-compaction + auto-vacuum + fresh statistics) + Liquid Clustering.

  • External engine writeability (Beta, 2026-05-14): Apache Spark, Apache Flink, and DuckDB can now create, batch read/write, and stream to/from UC managed tables — preserving full transactional safety. "Now in Beta, external engines, such as Apache Spark, Flink, and DuckDB, can create and write to UC managed Delta tables with centralized governance and automatic optimizations."

  • Streaming source + sink shape: managed tables work as both streaming source and sink, "enabling end-to-end real-time pipelines on Apache Spark."

  • Catalog commits as the substrate: every operation routes through UC, producing serialized commits that "prevent log corruption and complete auditability of every read and write."

  • Predictive Optimization is engine-boundary-transparent: "Predictive Optimization continues to run seamlessly, even on tables accessed by external engines." I.e., the optimisation layer is not coupled to the writer-engine identity.

  • Multi-statement, multi-table transaction substrate: catalog commits "lay the groundwork for features like multi-statement, multi-table transactions that require a centralized commit coordinator."

(Source: sources/2026-05-14-databricks-expanded-interoperability-with-unity-catalog-open-apis)

Why managed-table-with-open-API matters

The architectural problem it solves: customers wanted both the optimisation benefits of a managed table substrate (Predictive Optimization, Liquid Clustering, governance, auto-compaction) and compute-engine choice (Spark / Flink / DuckDB / Trino / Confluent Tableflow). Historically these were a trade — choosing managed tables meant funnelling all writes through the vendor's first-party compute. The Open APIs disclosure dissolves the trade: external engines write through the same catalog-commit substrate, so managed-table benefits apply uniformly.

The PepsiCo testimonial (Sudipta Das, Director of Enterprise Data Operations) names the shape: "empowered our teams to use their preferred tools while maintaining governance and data consistency. We can leverage the benefits of managed tables within a truly interoperable data and AI platform that works across multiple compute engines."

Canonical instance of concepts/external-engine-write-to-managed-table.

Composition with surrounding UC primitives

Primitive Role
systems/unity-catalog Hosts managed-table metadata + serializes commits + governs access.
systems/delta-lake Underlying open table format; managed tables are Delta tables with UC-owned storage discipline.
systems/delta-kernel The Java + Rust library external engines link against to read / write / commit; abstracts the Delta protocol so connectors integrate with UC, not with Delta internals.
systems/uc-credential-vending Mints short-lived, scoped credentials so external engines can fetch the underlying object-store data.
systems/unity-catalog-abac (forward) Will enforce row + column-level policies on external reads (roadmap as of 2026-05-14).

Activation contract (Beta, 2026-05-14)

Three rollout primitives the post discloses:

  1. Account-level enrollment — opt in to "External Access to Unity Catalog Managed Delta Table" in the Databricks preview portal.
  2. Metastore-level toggle — enable external data access on the metastore.
  3. Schema-level grant — grant EXTERNAL_USE_SCHEMA on the schema containing the tables to be exposed.

Plus a named migration path: "To move existing data, see the migration guide for converting external tables to managed."

Version pinning: Delta-Spark 4.2 + Unity Catalog 0.4.1 — the version-coordination contract for the Beta.

BI-serving foundation (2026-05-27 disclosure)

The 2026-05-27 BI Serving Pointers source frames UC managed tables as the foundation of the entire BI serving stack, not just a substrate option. Three load-bearing managed-table-only properties are named verbatim:

"Unity Catalog managed tables are the foundation for everything else in this stack. Unity Catalog manages all read, write, storage, and optimization responsibilities for managed tables. This unlocks automatic features you don't get with external tables: Predictive Optimization (covered below) is enabled by default. Automatic liquid clustering selects clustering keys that adapt as query patterns change. Metadata caching is always on, reducing cloud storage requests and speeding up query planning."

The recommendation is default-managed across all medallion layers, not just the BI-serving Gold tier:

"Use managed tables throughout the platform — not just for BI-serving, but across Bronze, Silver, and Gold layers. They're the default table type in Unity Catalog, and the performance and governance benefits compound with every other optimization in this stack."

Generalised at patterns/managed-table-as-default-storage-layer. The architect intent (which tables exist, what they hold, what clustering keys matter) stays user-side; the substrate owns optimisation execution.

Seen in

  • sources/2026-05-14-databricks-expanded-interoperability-with-unity-catalog-open-apisFirst wiki disclosure of UC Managed Tables as a distinct named primitive (previously the Delta-tables-under-UC framing was implicit). Names the four load-bearing properties: open API writeability + Predictive Optimization continuity across the engine boundary + serialized catalog commits + multi-table transaction substrate. PepsiCo testimonial. Beta enrollment contract. Reserved for future ingests: catalog-commit wire protocol, multi-statement transaction isolation guarantees, Liquid Clustering / Predictive Optimization mechanism depth, performance envelope under heterogeneous-engine concurrent writes, ABAC-for-external-reads enforcement model.
  • sources/2026-05-27-databricks-bi-serving-pointers-maximizing-for-performance-and-tcoUC managed tables as the BI-serving physical foundation. Names the three managed-table-only properties (default-on Predictive Optimization, automatic liquid clustering with CLUSTER BY AUTO, always-on metadata caching) and the cross-medallion-layer recommendation ("use managed tables throughout the platform", not just at the Gold tier). Compounds with the rest of the BI serving stack — every layer above (semantic / materialization / consumer) inherits the managed-table optimisation wins for free.
Last updated · 542 distilled / 1,571 read