PATTERN Cited by 3 sources
Broker-native Iceberg catalog registration¶
Broker-native Iceberg catalog registration is the pattern where
a streaming broker (writing data as
Apache Iceberg snapshots) owns the full lifecycle of its
Iceberg tables against an external REST catalog: creating the
table, registering each new snapshot, managing schema evolution,
and expiring old snapshots. Downstream analytics engines see the
tables appear and update automatically — no CREATE TABLE
DDL, no catalog-registration script, no CLI tool to keep in sync.
The canonical instance is Redpanda 25.1 Iceberg Topics (source: sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available):
"When a topic is configured for Iceberg, Redpanda can automatically register the corresponding table in the Iceberg catalog of your choice. That means the data produced to that topic is instantly queryable by all data lake and lakehouse users. No custom ETL, no knowledge of the underlying data's location, no credentials, and no manual registration needed." (Source)
What the broker owns¶
When the Iceberg-topic producer (the broker) writes records, it must maintain three things in the external catalog:
- Table existence — on first write to a new topic configured
for Iceberg,
CREATE TABLEagainst the catalog with the appropriate schema (derived from a registered producer schema or the Kafka record envelope). - Table lifecycle — each snapshot commit flushes a new set
of Parquet files to object storage
and then calls
COMMITagainst the catalog atomically, updating the table's current-snapshot pointer. - Table housekeeping — periodic snapshot expiry to bound metadata growth; schema evolution actions (add column, rename column, drop column) when the upstream producer's schema changes.
In the pre-25.1 world, ownership of all three was diffuse: the
broker wrote the data files, a custom connector (Kafka Connect,
Redpanda Connect, a Python-on-Airflow job) called CREATE TABLE
+ registered snapshots, and a separate scheduled Spark / Flink
job ran snapshot expiry. The 25.1 GA release collapses them
into a single broker-owned feature.
Alternatives displaced¶
Manual CREATE TABLE + one-off registration¶
Pre-this-pattern, a downstream team wanting a new topic available as an Iceberg table did:
- Operator runs
CREATE TABLEagainst the catalog with the chosen schema + partition spec. - Operator configures the streaming connector (Kafka Connect Iceberg sink, custom Python) to write against that table.
- Operator adds the table to the catalog ACL for downstream analytics engines.
- If the topic schema evolves, the operator runs
ALTER TABLEto match.
Operational cost: human-in-the-loop per new table; drift between topic schema and Iceberg schema if any step is missed.
Managed-connector-registers-the-table¶
Kafka Connect (with the Iceberg sink) and similar connectors auto-register the table on first write — a less-broker-coupled variant of broker-native registration. Trade-off:
- Pro: connector is a pluggable component; different connectors can use different catalog-registration policies.
- Con: connector operations burden remains; the connector cluster is a separate failure domain from the broker.
The broker-native pattern eliminates the connector cluster entirely — the broker is doing the work.
Why this matters for "zero-ETL"¶
The pattern is the mechanism behind the oft-used "zero-ETL" marketing phrase for streaming-broker ↔ lakehouse integrations. "Zero-ETL" is typically imprecise; the precise claim is:
- No external ETL cluster to operate (no Airflow, no Kafka Connect, no custom Python on a schedule).
- No manual
CREATE TABLEon downstream query engines — table appears automatically when topic is Iceberg-configured. - No separate schema registration for downstream engines — they read the schema via the catalog.
The data transformation work (row-to-Parquet projection, catalog updates, snapshot expiry) still exists — it's just internalised inside the broker rather than externalised to a separate integration pipeline.
Dependencies¶
- Iceberg catalog REST sync — the transport mechanism the broker uses to talk to the catalog (typically OIDC + TLS).
- Transactional writes — the commit protocol the broker's writes use, which lets other writers safely coexist with the broker on the same table.
- Built-in object-store catalog fallback — some implementations (Redpanda 25.1) include a broker-owned object-store-catalog mode when no REST catalog is available, so the broker can still register tables even in minimal deployments.
Trade-offs¶
- Catalog dependency in the write hot path. Broker-native registration couples write-path availability to catalog availability. If the REST catalog is unreachable at snapshot- commit time, the broker must either back-pressure the write path or commit to an object-store fallback and reconcile later.
- Broker becomes the catalog-administrator principal. The broker's OIDC / credential principal needs table-create + ACL permissions at the catalog — a larger trust surface than a read-only query engine.
- Schema drift management moves inside the broker. The broker must decide how to translate Kafka-record-schema evolution (Avro / JSON Schema / Protobuf) into Iceberg-spec schema evolution operations (add / rename / delete column). Schema- registry choices affect this translation.
- Vendor-specific feature. Broker-native registration is a Redpanda productisation on the Kafka API; upstream Apache Kafka has no equivalent primitive in the wire protocol as of 2025-04.
Seen in¶
-
sources/2026-03-05-redpanda-introducing-iceberg-output-for-redpanda-connect — Sink-connector counterpart framing. Redpanda Connect's 2026-03-05 Iceberg output launch reframes this pattern's "broker-native" specifier as one half of a two-shape architectural split. The Iceberg output sink connector performs the same Iceberg-catalog registration responsibility the broker-native Iceberg Topics does (creating the table on first write, registering snapshots as data lands, handling schema evolution via the REST catalog API) — but from a sink-connector process rather than inside the broker. Explicit launch-post framing: "Your engineers get a single pipeline definition to maintain. No sidecar services, no separate Flink job." This is the sink-connector-altitude peer of the broker-native pattern, canonicalised as patterns/sink-connector-as-complement-to-broker-native-integration.
-
sources/2026-01-06-redpanda-build-a-real-time-lakehouse-architecture-with-redpanda-and-databricks — Unity-Catalog-specific instance of the broker-owns- catalog pattern. Full lifecycle disclosed verbatim: "Redpanda integrates directly with Unity Catalog using the Iceberg REST API. Through this integration, Redpanda registers Iceberg tables, manages schema updates, deletes tables when necessary, and handles the full lifecycle of the data." Table-deletion lifecycle is explicit here ("deletes tables when necessary") — prior ingests focused on create + register- snapshot + schema-evolution without naming deletion. Joint- vendor framing with Jason Reed (Databricks) supplying the consumer-side corroboration: "The data shows up already structured, already governed, and already queryable."
- sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available — canonical wiki source. 25.1 GA release promotes broker- native Iceberg catalog registration to a core feature with three named catalog implementations (Unity, Snowflake Open Catalog / Apache Polaris, AWS Glue) and a built-in object-store catalog fallback for minimal deployments.
Related¶
- concepts/iceberg-catalog-rest-sync — the transport mechanism.
- concepts/iceberg-topic — the concept this pattern operationalises.
- concepts/iceberg-snapshot-expiry — the GC loop the broker additionally owns.
- systems/redpanda-iceberg-topics — canonical system instance.
- systems/apache-iceberg · systems/unity-catalog · systems/snowflake · systems/databricks — catalog counterparts.
- patterns/streaming-broker-as-lakehouse-bronze-sink — the broader architectural pattern this catalog-registration pattern instantiates on the integration axis.