Skip to content

CONCEPT Cited by 4 sources

Iceberg catalog REST sync

Iceberg catalog REST sync is the mechanism by which a producer (streaming broker, batch writer, CDC connector) registers and maintains a table in an external Apache Iceberg REST catalog — the standard integration surface between Iceberg tables and downstream analytics engines.

The REST catalog protocol is Iceberg's answer to "where does the table live, and how do readers find it" — a managed HTTP service that owns the mapping from a logical table name to the current snapshot pointer + schema + partition spec + ACL metadata. Canonical implementations include:

Source: sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available.

Why REST catalogs exist

Early Iceberg deployments embedded catalog state in the metadata bucket alongside the data (the "Hadoop-style" catalog shape) or in a hand-rolled JDBC catalog (MySQL / Postgres fronting). Both shapes coupled catalog access to storage credentials and made cross-engine access brittle — every reader had to agree on the storage-layout convention, and ACLs were object-scoped rather than table-scoped.

The REST catalog protocol decouples the catalog from the storage. A downstream engine (ClickHouse, Snowflake, Databricks, Trino, Spark, Flink) authenticates against the REST catalog, reads the current snapshot pointer for a named table, and fetches the referenced data files from object storage using the storage credential delegated by the catalog. Three separations fall out:

  1. Identity — the authenticated principal is the REST-catalog user, not an S3/GCS bucket identity.
  2. Table-level ACL — permissions attach to the logical table, not to the underlying object keys.
  3. Cross-engine consistency — every engine sees the same snapshot pointer, because the REST catalog is the single source of truth.

Security surface

A REST-catalog-integrated Iceberg producer typically authenticates via OpenID Connect (OIDC) on TLS. OIDC provides token-based identity (so the producer doesn't persist long-lived shared credentials against the catalog); TLS provides transport-level confidentiality for the catalog metadata (table schema, snapshot list, ACL data).

Source framing on Redpanda Iceberg Topics:

"Secure REST catalog sync, supporting Snowflake Open Catalog, Databricks Unity, and others; using OIDC and TLS for secure, seamless interoperability with enterprise analytics platforms." (Source: sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available)

Transactional-write implication

REST catalogs enforce a serialisation order on concurrent catalog updates via Iceberg's optimistic-concurrency commit protocol. The commit is the transaction boundary — two writers racing to update the same table's snapshot pointer cannot both succeed; one commit is accepted and the other must retry with the updated base snapshot. This is what enables multiple writers (a Redpanda Iceberg topic plus a batch Spark job plus a Flink job, all writing to the same table) to coexist without a distributed lock outside the catalog.

See concepts/transactional-write for the generic framing.

Automatic table discovery

A producer configured for REST-catalog sync creates the table (if absent) on first write and registers subsequent snapshots against it. Downstream engines configured to read from the same catalog see new tables appear automatically, without client-side configuration — the operator doesn't run CREATE TABLE on every engine. This is the mechanism the patterns/broker-native-iceberg-catalog-registration pattern instantiates.

Trade-offs

  • Catalog availability is in the write path. If the REST catalog is unreachable, the producer cannot register new snapshots, which may back-pressure or fail the write depending on the producer's policy. A built-in object-store-catalog fallback (as in Redpanda 25.1 Iceberg Topics) lets producers continue writing when the REST catalog is unavailable, at the cost of losing cross-engine metadata coherence until the catalog reconnects.
  • Catalog implementation drift. The Iceberg REST protocol is standard but implementations differ in extensions (ACL model, namespace semantics, credential-delegation shape); production deployments pin to a specific catalog.
  • Token refresh. OIDC tokens are short-lived; the producer's token-refresh loop is operational surface that must be reliable — token expiry during a long-running commit fails the commit.

Seen in

  • sources/2026-03-05-redpanda-introducing-iceberg-output-for-redpanda-connectREST catalog as sink-connector integration surface (new altitude). Redpanda Connect's 2026-03-05 Iceberg output launch canonicalises the REST catalog as the integration surface for a streaming sink connector (as opposed to the broker-native Iceberg Topics producer of the prior four sources on this page). Named catalog matrix verbatim: "Apache Polaris, AWS Glue Data Catalog, Databricks Unity Catalog, Snowflake Open Catalog, GCP BigLake — if your catalog speaks REST, you can point the connector at it." Worked OAuth2 client-credentials auth example against a Polaris endpoint canonicalised in the pipeline YAML. Enterprise framing verbatim: "Redpanda Connect fits into your existing OAuth2 token exchange and per-tenant REST catalog (like Polaris) workflows out of the box." The per-tenant-REST-catalog pattern is load-bearing for multi-tenant isolation at 0.1 vCPU per-pipeline density.

  • sources/2026-01-06-redpanda-build-a-real-time-lakehouse-architecture-with-redpanda-and-databricksGovernance-endpoint framing at joint-Databricks altitude. The REST catalog is re-framed from the transport-mechanism altitude of the 2025-04-07 GA post to the governance endpoint altitude: "the Iceberg REST Catalog centralizes governance at the catalog layer" and becomes the "single control plane" with three named responsibilities: "Managing permissions and access control. Coordinating concurrent reads and writes. Dynamically granting engines access to data at runtime." Interoperability emphasised: "Different platforms, written in different languages and running in different environments, can exchange metadata and enforce governance by speaking the same protocol." Historical arc context: file-based catalogs ("collections of files stored directly in object storage") created "metadata sprawl and governance gaps" that REST catalogs resolved. Unity Catalog + Snowflake Polaris + AWS Glue named as implementations (BigLake omitted — post predates the 25.3 BigLake disclosure or intentionally Databricks-focused). No mechanism depth beyond 2025-04-07 canonicalisation.

  • sources/2025-11-06-redpanda-253-delivers-near-instant-disaster-recovery-and-morefourth managed REST catalog added to Redpanda Iceberg Topics' integration surface. Redpanda 25.3 integrates with Google BigLake metastore (with Dataplex as the governance layer), completing the set with Unity Catalog / Snowflake Open Catalog (Polaris) / AWS Glue / BigLake. Verbatim: "Redpanda's native Iceberg integration can automatically register streaming tables to the Google BigLake metastore, so those tables are discoverable, secure, and governed alongside the rest of your GCP analytics estate." Makes GCP a first-class REST-catalog consumer alongside AWS / Azure / Databricks / Snowflake.

  • sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available — canonical wiki source. Positions secure REST catalog sync (OIDC + TLS, to Snowflake Open Catalog / Databricks Unity / AWS Glue) as the default integration surface for Iceberg Topics at GA; the built-in object-store catalog is framed as the fallback "suitable for ad hoc access by data engineers when no REST catalog is available."

Last updated · 470 distilled / 1,213 read