SYSTEM Cited by 10 sources

Redpanda Iceberg topics¶

Redpanda Iceberg topics are a broker-native feature of Redpanda that make a single logical entity addressable as both a Kafka-protocol topic and an Apache Iceberg table backed by the same data. Introduced in 2024 as part of Redpanda's data-platform vision; promoted to General Availability in the 25.1 release (2025-04-07) across AWS, Azure, and GCP — framed in Redpanda's launch post as the first Kafka-Iceberg streaming solution GA on multiple clouds.

From the producer side it's a normal Kafka topic — any Kafka client can write to it. The broker then transparently:

Stores the records in its local log segments (normal topic durability).
Projects row-oriented records into columnar Parquet files.
Writes the Parquet files to an external object store (S3 / GCS / ADLS).
Registers the new snapshot with an external Iceberg REST catalog (Databricks Unity, Snowflake Polaris, AWS Glue, ...) via Iceberg catalog REST sync (OIDC + TLS).

Downstream Iceberg-aware engines — ClickHouse, Snowflake, Databricks, Dremio, Trino, Spark, Flink — can then query the topic's data as an Iceberg table without any ETL job, Connect cluster, or custom code.

Conceptual framing¶

This is the canonical system-level instance of the Iceberg topic concept and the streaming- broker-as-lakehouse-Bronze-sink pattern on this wiki. The patterns/broker-native-iceberg-catalog-registration pattern canonicalises the broker-as-catalog-administrator property that makes the integration shape "zero-ETL" from the operator's perspective.

Key properties (GA, Redpanda 25.1)¶

Core properties (canonical on this feature since launch)¶

Zero-ETL integration — no Kafka Connect, no Redpanda Connect, no Python-on-Airflow pipeline. Topic→Iceberg is a configuration change, not a code change or a new cluster to operate.
Kafka-layer metadata preservation — offset, partition, and timestamp are carried across into the Iceberg table as columns, so downstream analytics can join on broker-layer metadata without losing the audit trail.
Schema optional — tables can be created without a schema (defaulting to key + value + offset + partition + timestamp) or with a typed schema for downstream query ergonomics.
External REST catalog registration — Redpanda supports standard Iceberg REST catalogs (Databricks Unity, Snowflake Polaris / Open Catalog based on Apache Polaris, AWS Glue) as the integration surface for downstream engines.
Catalog-agnostic downstream reads — any Iceberg-compatible engine can query the same tables via the standard Iceberg client libraries.

GA-disclosed table-management properties (25.1)¶

Four capabilities canonicalised when Iceberg Topics were promoted from preview to GA (source: sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available):

Custom, hierarchical, bucketed partitioning — operator- controllable Iceberg partition transforms (bucket, truncate, year/month/day/hour on timestamps) for fine-tuned table layouts and downstream query-side partition pruning.
Built-in dead-letter queues — records that fail schema validation during the row-to-Parquet projection are redirected to a DLQ topic rather than dropping the whole batch. The canonical instance of patterns/dead-letter-queue-for-invalid-records on this wiki.
Seamless Iceberg-spec-compliant schema evolution — full support for field additions, renames, and deletions over time, matching the Iceberg specification's evolution semantics. Retires the wiki's earlier caveat "how Iceberg-topic schema changes interact with Kafka-client serializers... is a source of operational complexity the pedagogy post glosses".
Automatic snapshot expiry — the broker performs automatic housekeeping of old snapshot metadata as tables age, bounding metadata growth. Retires the wiki's earlier caveat "compaction + GC ownership unclear from the pedagogy post" (for the snapshot-expiry half of the loop; compaction-ownership remains an open question).

GA-disclosed catalog-integration properties (25.1)¶

Five capabilities disclosed at GA for the catalog-integration surface:

Secure REST catalog sync via OIDC + TLS against named catalogs (Snowflake Open Catalog, Databricks Unity, AWS Glue). Canonicalised as concepts/iceberg-catalog-rest-sync.
Transactional writes — the broker uses Iceberg's commit-protocol serialisation, so other clients (Spark, Flink, custom writers) can safely write concurrently to the same Iceberg table without external locking.
Automatic table discovery — newly Iceberg-configured topics auto-register with the REST catalog on first write; downstream analytics engines attached to the same catalog see the table appear without any CREATE TABLE DDL or client-side configuration. Instantiates patterns/broker-native-iceberg-catalog-registration.
Built-in object-store catalog fallback — when no external REST catalog is configured, Redpanda ships a broker-owned object-store-based catalog, "suitable for ad hoc access by data engineers when no REST catalog is available."
Tunable workload management — an explicit operator knob bounding how far the Iceberg snapshot can lag the live topic. Trade-off dial between end-to-end freshness and the broker-CPU budget for Parquet projection + catalog commits. Makes the commit-cadence lag floor an explicit operational parameter, not an implicit one.

Architectural role¶

Iceberg topics collapse what was historically a multi-system integration (producer → Kafka → ETL → lakehouse) into a single broker feature:

┌─────────────┐     ┌──────────────────────┐
│  Producers  │────▶│  Redpanda Iceberg    │────▶  Iceberg REST catalog
└─────────────┘     │      topic           │       (Unity / Polaris /
                    │                      │        AWS Glue)
                    │  • log segments      │       ↑
                    │  • Parquet files on  │──┐    │  OIDC+TLS sync
                    │    object storage    │  │    │
                    │  • snapshot commits  │  │    │
                    │  • snapshot expiry   │  │    │
                    │  • schema evolution  │  │    │
                    │  • DLQ redirect      │  │    ▼
                    └──────────────────────┘  │   Iceberg-aware
                             │                │   query engines
                             ▼                │   (ClickHouse,
                        DLQ topic             │    Snowflake,
                        (invalid records)     │    Databricks, ...)
                                              │
                                              └── also read Parquet
                                                  via catalog pointer

The Bronze tier of a Medallion Architecture becomes a derived view of topics the business is already producing to — not a separate ingestion pipeline.

Topic-level configuration surface¶

Iceberg Topics expose two topic-level configuration parameters canonicalised on the 2025-05-13 BYOC setup walkthrough (Source: sources/2025-05-13-redpanda-getting-started-with-iceberg-topics-on-redpanda-byoc):

iceberg_enabled (boolean) — per-topic flag that turns on the Iceberg-projection loop. Setting it to true instructs the broker to project records from that topic to Parquet+Iceberg on Tiered Storage alongside normal Kafka-log durability.
redpanda.iceberg.mode — selects the schema-projection strategy, one of three values canonicalised as concepts/iceberg-topic-mode: value_schema_id_prefix (Schema-Registry-wire-format producers; typed-column Iceberg table), value_schema_latest (latest-schema projection without per-record schema-ID prefix), key_value (schema-less ingestion as BYTES columns + Kafka metadata).

Topic-mode selection is orthogonal to catalog-integration shape (REST catalog vs file-based catalog); the two axes compose.

BYOC beta (2025-05-13)¶

Five weeks after the 25.1 GA disclosure, Redpanda extended Iceberg Topics to Redpanda BYOC as a beta (Source: sources/2025-05-13-redpanda-getting-started-with-iceberg-topics-on-redpanda-byoc).

The beta-scoped capabilities verbatim:

"Self-service configuration of Iceberg settings at the cluster level via our rpk CLI or Cloud HTTP API. Direct integration with popular REST catalogs like Snowflake Open Catalog, or with Iceberg clients like Google BigQuery via a file-based catalog. Support for secure credential handling (e.g., Iceberg REST catalog secrets)."

Two novel disclosures vs the GA post:

File-based catalog¶

Named as a primary integration option for engines (like BigQuery) that read Iceberg directly from a metadata JSON pointer rather than through a catalog protocol. Reframes what the GA post called the "built-in object-store catalog fallback" from a last-resort to a first-class option; canonicalised as concepts/iceberg-file-based-catalog. The read-side companion pattern is patterns/external-table-over-iceberg-metadata-pointer.

BYOC data ownership¶

Load-bearing property verbatim: "full control of your Iceberg data with zero compromises." Because the BYOC data plane already runs inside the customer's cloud account, Iceberg Parquet + metadata files land directly in the customer's own bucket under the customer's IAM + KMS + lifecycle rules. Canonicalised as concepts/byoc-data-ownership-for-iceberg — the BYOC-specific compound property that extends Data Plane Atomicity to the Iceberg data-output surface.

Trade-offs vs alternatives¶

Redpanda's framing (extended across the 2025-01-21 pedagogy post and the 2025-04-07 GA post) enumerates the two pre-Iceberg-topic alternatives the feature displaces:

Custom Airflow + Python jobs. Reading from Kafka, converting payloads, writing Parquet, updating the Iceberg catalog. Operational cost: "required specialized talent to write, test, and maintain them, which is error-prone and time-consuming."
Managed ETL connectors (Redpanda Connect, Kafka Connect with Iceberg sink). Operational cost: "these systems introduce a middleman to the architecture, requiring you to configure and maintain a separate set of clusters for data integration" — and no configuration-only path for an existing topic to become an Iceberg table.

Iceberg topics are intended to win on operational simplicity over either alternative, not on raw throughput or latency.

In-cluster SQL query surface (Redpanda SQL GA, 2026-05-27)¶

The 2026-05-27 GA of Redpanda SQL (built on Oxla) makes Iceberg Topics the canonical substrate for in-cluster ad-hoc SQL over both their live and cold tiers in a single query. Verbatim from the GA launch (Source: sources/2026-05-27-redpanda-redpanda-sql-is-ga-the-query-engine-that-skips-the-pipeline):

"If you're using Redpanda Iceberg Topics, which store your streaming data in both a live tier and a Parquet/Iceberg cold tier in S3 or GCS simultaneously, Redpanda SQL bridges the two tiers transparently. The engine figures out an optimized read path across both. (And you don't have to care.)"

The Iceberg Topics' load-bearing simultaneous-write to both tiers is what makes Redpanda SQL's transparent two-tier query bridge feasible. The broker writes records into the local log (live tier) and the Parquet/Iceberg files in object storage (cold tier) as part of the same record-handling path; both tiers share a record sequence and are in agreement up to the last-committed snapshot.

This adds a fourth user-facing access path to the substrate:

Access path	Engine residence	Substrate property used
Kafka producer / consumer client	Anywhere	Live broker tier (log segments)
External Iceberg-aware engine (Snowflake / Spark)	Outside the cluster	Cold tier (Parquet + Iceberg catalog metadata)
In-cluster external-tool reader	External	Cold tier via catalog pointer
Redpanda SQL (this section)	Inside the cluster	Both tiers, one SQL statement, transparent bridge

The fourth row's "both tiers, one SQL statement, transparent bridge" is structurally distinct: no consumer-side merge, no ETL hop, no engine-external-to-cluster network egress. The Iceberg-Topics + Redpanda-SQL combination makes the patterns/transparent-hot-cold-tier-query pattern realisable at single-cluster scope.

This also extends the wiki's earlier framing that "downstream analytics happens outside Redpanda" — with Redpanda SQL GA, an in-cluster analytical surface also exists for the Redpanda BYOC AWS consumption-plan deployment shape. The cold-tier Parquet files remain readable by external Iceberg-aware engines (the standard GA-disclosed property since 25.1); the new property is that an in-cluster SQL surface coexists, and it's the only surface that bridges hot + cold transparently.

Costs / caveats¶

Vendor-specific primitive — Iceberg topics are a Redpanda feature; Apache Kafka has no equivalent native mechanism in the Kafka wire protocol as of 2025-04. Known competing products in the Kafka-to-Iceberg space include Kafka Connect with the Tabular Iceberg sink, Upsolver, and Confluent Tableflow; Redpanda's "first on multiple clouds GA" claim does not explicitly enumerate comparison targets.
Duplicate storage during retention window — data lives in the broker's log segments and in object-storage Parquet files concurrently until topic retention expires.
Compaction ownership remains open — the GA release explicitly internalises snapshot expiry as a broker-owned loop but does not explicitly name small- file compaction as broker-owned. Customers writing high- throughput Iceberg topics may still need a separate Spark / Flink compaction job to merge small Parquet files for scan performance. See systems/apache-iceberg for the generic externalisation-cost framing.
Commit-cadence latency floor — Iceberg snapshot commits happen on a flush interval (seconds-to-minutes by default), not per record. The 25.1 tunable workload-management knob lets the operator set the lag ceiling, but downstream Iceberg-reader latency is bounded below by whatever cadence the operator picks.
DLQ operational surface not fully specified in the GA post — retention default, envelope-schema shape, replay tooling, monitoring recommendations — all deferred to product documentation at the time of the launch post.
Transactional-write isolation level not stated — the GA post names "transactional writes" but does not specify the isolation level (serializable? snapshot isolation?), the concurrent- writer conflict-resolution policy, or the recovery behaviour after a half-written commit.
Catalog availability in write path — broker-native REST catalog sync couples Iceberg-topic write-path availability to catalog availability; the object-store fallback mitigates but introduces a cross-engine metadata coherence gap until the REST catalog reconnects.
R1 engine coupling — the 2025-01-21 post framed Iceberg topics as a component of Redpanda's broader R1 multi-modal streaming data engine vision; the GA post does not re-invoke that framing, so the wiki defers R1 details until a dedicated source is ingested.

BigLake metastore (Redpanda 25.3, 2025-11-06)¶

Redpanda 25.3 adds Google BigLake metastore to the list of supported REST catalog integrations — the fourth managed catalog alongside Databricks Unity, Snowflake Open Catalog (Polaris), and AWS Glue.

Verbatim from the sources/2025-11-06-redpanda-253-delivers-near-instant-disaster-recovery-and-more|25.3 launch post:

"If you're on GCP, your lakehouse life runs through BigLake/Dataplex and BigQuery. With 25.3, Redpanda's native Iceberg integration can automatically register streaming tables to the Google BigLake metastore, so those tables are discoverable, secure, and governed alongside the rest of your GCP analytics estate."

The 25.3 addition completes REST-catalog coverage for the three major hyperscalers (AWS = Glue, Azure/cross = Unity, GCP = BigLake) plus the two independent lakehouse vendors (Databricks Unity, Snowflake Open Catalog). GCP users now have both integration shapes:

REST catalog (25.3) — BigLake metastore, governed by Dataplex, discoverable in BigQuery without manual DDL.
File-based catalog (2025-05-13 BYOC beta) — BigQuery reads Iceberg tables directly via CREATE EXTERNAL TABLE … format = 'ICEBERG' pointing at a specific vN.metadata.json in GCS.

The two shapes compose per-topic.

Seen in¶

sources/2026-03-31-redpanda-261-delivers-the-industrys-first-adaptable-streaming-engine — Redpanda 26.1 launch post: JSON enhancements for Iceberg Topics. Verbatim: "Iceberg Topics now support JSON sub-schemas, nullable fields, and other enhancements for translating complex JSON structures into a clean bronze layer in Apache Iceberg." The third round of Iceberg Topics enhancements since GA (25.1 GA → 25.3 BigLake → 26.1 JSON). Nullable-field support is the schema-evolution-friendly variant that lets JSON-origin schemas add fields without rewriting existing Parquet files. JSON sub-schema support lets nested JSON objects project into Iceberg struct types rather than flatten-encoded blobs. No mechanism depth beyond the one-sentence feature disclosure; format spec, nullable- field Iceberg-type mapping, and interaction with existing Parquet projections all deferred to release notes.
sources/2026-03-05-redpanda-introducing-iceberg-output-for-redpanda-connect — Explicit architectural framing as the broker-native counterpart to the new Iceberg output sink connector. Redpanda Connect's 2026-03-05 Iceberg output launch post positions Iceberg Topics as "a zero-ETL path from broker to table that's streamlined for high-speed Kafka streams. Produce to a topic, and Redpanda handles the rest. For many workloads, that's all you need" — and the Iceberg output as filling the gap "where your data arrives from an HTTP webhook, a Postgres CDC stream, or a GCP Pub/Sub subscription. Maybe you need to normalize a payload, drop PII, or split a mixed event stream by type before anything hits the lakehouse." The launch post canonicalises a seven-axis comparison table against the Iceberg output — primary value (zero-ETL convenience vs integration flexibility), data sources (Kafka only vs 300+), schema evolution (registry-driven vs data-driven), routing (1→1 vs multi-table), infrastructure (zero extra vs stateless K8s container), availability (BYOC/EE vs Connect Enterprise tier). The two shapes are explicitly complementary, not competing — each covers workloads the other can't. This is the first wiki source to frame the broker-native-vs-sink- connector architectural split; canonicalised as the pattern patterns/sink-connector-as-complement-to-broker-native-integration.
sources/2026-01-06-redpanda-build-a-real-time-lakehouse-architecture-with-redpanda-and-databricks — Joint-vendor framing with Databricks (2026-01-06 tech- talk recap). Matt Schumpert (Redpanda) + Jason Reed (Databricks, ex-Netflix data team) co-frame Iceberg Topics as the "stream is the table" primitive that lets streaming data be "analytics-ready by default". Canonical slogan verbatim: "The goal of this partnership is to remove the artificial line between real-time data and analytical data." Unity-Catalog-specific integration disclosure verbatim: "Redpanda integrates directly with Unity Catalog using the Iceberg REST API. Through this integration, Redpanda registers Iceberg tables, manages schema updates, deletes tables when necessary, and handles the full lifecycle of the data." No net-new mechanism; pure joint-vendor-framing + Netflix-origin-of-Iceberg disclosure (Jason Reed).
sources/2025-11-06-redpanda-253-delivers-near-instant-disaster-recovery-and-more — BigLake metastore integration in Redpanda 25.3 (2025-11-06 preview). Adds GCP's managed lakehouse catalog as the fourth REST-catalog integration. Complements the prior file-based-catalog option for BigQuery from the 2025-05-13 BYOC tutorial.
sources/2025-05-13-redpanda-getting-started-with-iceberg-topics-on-redpanda-byoc — BYOC beta extension of Iceberg Topics (five weeks after GA) with a GCS + BigQuery worked example. Canonicalises three new primitives prior ingests elided: the topic-mode configuration surface (value_schema_id_prefix, value_schema_latest, key_value), the file-based catalog as a first-class alternative to REST catalog sync, and the BYOC-data-ownership compound property. Adjacent secondary disclosure: 2× BYOC partition-density ceiling in 25.1 (Tier 1: 1,000 → 2,000; Tier 5: 22,800 → 45,600) via per-partition memory efficiency improvements (canonicalised as concepts/broker-partition-density). Tutorial altitude; no latency / throughput / production-scale numbers.
sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available — GA release disclosure (Redpanda 25.1, multi-cloud availability). Canonicalises the nine new properties listed above — four table-management (custom hierarchical bucketed partitioning, DLQ, Iceberg-spec schema evolution, automatic snapshot expiry) and five catalog-integration (secure REST catalog sync via OIDC+TLS, transactional writes, automatic table discovery, built-in object-store catalog fallback, tunable workload management). Retires the 2025-01-21 wiki caveats about GC ownership and schema-evolution operational complexity (for the snapshot-expiry half and the Iceberg-spec evolution surface specifically).
sources/2025-01-21-redpanda-implementing-the-medallion-architecture-with-redpanda — pre-GA pedagogy launch. Walks the architectural role (Bronze-tier sink), enumerates the displaced alternatives (custom ETL jobs / Connect clusters), names supported external catalogs, and positions the feature as the mechanism that makes Redpanda serve as the Bronze layer of a lakehouse without an external integration pipeline. Pedagogy altitude; no latency / throughput / cost numbers.
sources/2024-12-03-redpanda-redpanda-243-extends-lakehouses-with-streaming-data-cdc — earliest-wiki-visible Iceberg Topics announcement as part of the Redpanda 24.3 release roundup. Beta for self-managed Enterprise + Redpanda Cloud BYOC (non- production use only at this point). Establishes the BYOC-first framing and the per-topic opt-in model ("The integration works on a per-topic basis, allowing you to mix and match Iceberg Topics alongside other regular topics in the same cluster") before the 2025-01-21 pedagogy post elaborates on the architectural role. Earlier than any other Iceberg-Topics source ingest on the wiki.
sources/2025-06-24-redpanda-why-streaming-is-the-backbone-for-ai-native-data-platforms — Iceberg Topics at vision / backbone altitude — framed as the broker-native realisation of "materialized views of the raw event stream" with Apache Polaris named as the REST-catalog choice for open metadata. Sibling framing alongside Snowpipe Streaming as the proprietary-format ingestion route. Thought-leadership altitude; no mechanism disclosure beyond the GA 25.1 surface already canonicalised above.

systems/redpanda — the broker that hosts the feature.
systems/redpanda-byoc — the deployment model that enables the 2025-05-13 beta's customer-bucket data ownership.
systems/apache-iceberg — the open table format the feature writes into.
systems/apache-parquet — the columnar file format used for the on-disk data.
systems/unity-catalog · systems/snowflake · systems/databricks · systems/aws-glue · systems/google-biglake — named external Iceberg REST catalog / downstream-engine pairings (Unity Catalog, Snowflake Open Catalog / Polaris, AWS Glue, BigLake metastore).
systems/clickhouse · systems/google-bigquery — named downstream query engines (BigQuery canonicalised via the 2025-05-13 BYOC tutorial's file-based-catalog demo).
systems/google-cloud-storage — the object store in the 2025-05-13 worked example.
concepts/iceberg-topic — the concept this system realises.
concepts/iceberg-topic-mode — the per-topic schema-projection configuration surface.
concepts/iceberg-catalog-rest-sync · concepts/iceberg-file-based-catalog — the two catalog- integration shapes the feature supports.
concepts/iceberg-snapshot-expiry — GA-canonicalised broker-owned metadata-GC loop.
concepts/byoc-data-ownership-for-iceberg — the BYOC + Iceberg-Topics compound data-ownership property.
concepts/medallion-architecture · concepts/data-lakehouse · concepts/open-table-format — the architectural context.
concepts/schema-evolution — the evolution semantics Iceberg Topics now support at Iceberg-spec granularity.
patterns/streaming-broker-as-lakehouse-bronze-sink — the canonical pattern this feature instantiates.
patterns/broker-native-iceberg-catalog-registration — the catalog-registration pattern the GA release canonicalises.
patterns/external-table-over-iceberg-metadata-pointer — the read-side pattern the 2025-05-13 BigQuery demo instantiates.
patterns/dead-letter-queue-for-invalid-records — the data-quality pattern the built-in DLQ instantiates.
companies/redpanda — vendor.