Redpanda — Redpanda 25.1: Iceberg Topics now generally available¶
Summary¶
Redpanda's 2025-04-07 product post announces General Availability of Iceberg Topics on the 25.1 release — the first Kafka-API-compatible streaming broker with a broker-native Apache Iceberg integration promoted to GA across all three major clouds (AWS, Azure, GCP). The post elaborates the feature surface well beyond the 2025-01-21 pedagogy launch (Medallion-with-Redpanda), disclosing four GA-grade table-management capabilities (custom hierarchical bucketed partitioning, built-in dead-letter queues for invalid records, Iceberg-spec-compliant schema evolution, automatic snapshot expiry for metadata GC) and five catalog-integration capabilities (Secure REST catalog sync with OIDC + TLS, transactional writes for concurrent-writer safety, automatic table discovery, built-in object-store catalog when no REST catalog is available, tunable workload management bounding topic↔table lag). The 25.1 release also adds native consumer group lag metrics (Prometheus-exposed, replacing a previously documented PromQL compute), Protobuf schema normalization in the Schema Registry, SASL/PLAIN authentication, unified Console+cluster identity with fine-grained RBAC, and platform-centric versioning for Kubernetes deployments (FluxCD removal). This post is a product announcement, not a first-principles retrospective — but the GA feature disclosure retires three caveats on prior wiki pages (compaction/GC ownership, schema-evolution path, REST-catalog protocol details).
Source claim¶
"Redpanda is the first product in the industry to offer a Kafka-Iceberg streaming data solution that is generally available for use on multiple clouds, AWS, Azure, and GCP." (emphasis in original; source)
And the framing of the GA feature surface:
"This approach significantly reduces data duplication, eliminates mundane data engineering work, slashes compute costs (compared to alternatives like Kafka Connect), minimizes latency between event capture and insight, and simplifies your overall data stack." (source)
Key takeaways¶
- Multi-cloud GA, not AWS-only. Iceberg Topics ship GA on AWS, Azure, and GCP simultaneously (per the post's "first in the industry" claim). Redpanda Cloud support is "coming soon" at publication time but the self-managed product is GA everywhere.
- Custom hierarchical bucketed partitioning. Table partitioning is now operator-controllable — previously the 2025-01 launch did not disclose how partitioning worked beyond the implicit Kafka-partition-to-Iceberg-column projection. 25.1 adds explicit Iceberg-style transforms (bucket, truncate, year/month/day/hour on timestamps) for query-side pruning.
- Built-in dead-letter queues for invalid records. Records that fail schema validation (wrong type, unparseable, schema mismatch) are redirected to a DLQ topic rather than dropping the whole batch, keeping the Iceberg side's data quality invariant. Canonicalised as patterns/dead-letter-queue-for-invalid-records.
- Seamless Iceberg-spec-compliant schema evolution. Full Iceberg-spec evolution surface — field additions, renames, deletions — applied safely over time against the produced-from- Kafka table. Retires the caveat "how Iceberg-topic schema changes interact with Kafka-client serializers is a source of operational complexity the pedagogy post glosses" on concepts/iceberg-topic.
- Automatic snapshot expiry. The broker performs the housekeeping of pruning old snapshot pointers/files as they age, bounding metadata growth. Canonicalised as concepts/iceberg-snapshot-expiry. Retires the caveat "compaction + GC ownership unclear from the pedagogy post" on concepts/iceberg-topic and systems/redpanda-iceberg-topics.
- Secure REST catalog sync + transactional writes. Iceberg Topics now register tables via OIDC + TLS to any Iceberg-REST-compatible catalog (Snowflake Open Catalog based on Apache Polaris™, Databricks Unity, AWS Glue). The transactional writes property lets other clients safely write to the same Iceberg table concurrently — Iceberg's commit protocol is the serialisation primitive. Canonicalised as concepts/iceberg-catalog-rest-sync + patterns/broker-native-iceberg-catalog-registration.
- Built-in object-store catalog fallback for ad-hoc access when no REST catalog is available. This is the "minimum viable integration" shape — Iceberg metadata lives alongside the data in the object store, queryable by any engine that can read an Iceberg metadata pointer directly.
- Automatic table discovery. New topics configured for Iceberg
auto-register in the attached catalog — no manual
CREATE TABLEstep on any downstream analytics platform. - Tunable workload management. The broker has an explicit knob for how far behind the Iceberg snapshot can lag the live topic, giving operators a trade-off dial between freshness and broker-CPU budget for the Parquet-projection + catalog-commit work. Explicit confirmation that the commit cadence has a lag floor (retires the implicit framing on concepts/iceberg-topic).
- Native consumer group lag metrics replace the previously documented PromQL compute. Exposed via Prometheus; visible in Grafana, Datadog, and Redpanda Console. Fills the canonical observability story for the Kafka API consumer-side lag signal — canonicalised as concepts/kafka-consumer-lag-metric.
Systems and concepts extracted¶
Systems¶
- systems/redpanda-iceberg-topics — the feature promoted to GA across all three major clouds (the subject of the post).
- systems/redpanda — 25.1 release; multiple new features listed.
- systems/apache-iceberg — the open table format target.
- systems/apache-parquet — the columnar file format written.
- systems/snowflake — named REST-catalog + query engine via Open Catalog (Apache Polaris).
- systems/databricks — named REST-catalog + query engine via Unity Catalog.
- systems/unity-catalog — named REST catalog.
- systems/clickhouse — named downstream query engine.
- systems/kafka — the wire protocol Redpanda implements; full compatibility assumed throughout.
- systems/prometheus — canonical export surface for the new consumer group lag metrics.
- systems/grafana + systems/datadog — named visualisation / alerting surfaces downstream of Prometheus.
Concepts¶
- concepts/iceberg-topic — the core concept, promoted to GA with disclosed GA-grade property set.
- concepts/iceberg-catalog-rest-sync — new concept; OIDC + TLS sync to Iceberg REST catalogs as the canonical integration surface between a streaming broker and an external analytics catalog.
- concepts/iceberg-snapshot-expiry — new concept; automatic pruning of old Iceberg snapshot metadata as a broker-owned GC loop.
- concepts/kafka-consumer-lag-metric — new concept; native Kafka-API consumer group lag metric as a foundational observability signal on streaming pipelines.
- concepts/schema-evolution — extended by this source; Iceberg topics now support full Iceberg-spec evolution.
- concepts/transactional-write — concept framing of Iceberg's commit-protocol serialisation as the mechanism for safe concurrent multi-writer access.
- concepts/open-table-format · concepts/data-lakehouse · concepts/medallion-architecture — architectural context established in the 2025-01-21 pedagogy ingest; reinforced here.
Patterns¶
- patterns/streaming-broker-as-lakehouse-bronze-sink — the canonical wiki pattern Iceberg Topics instantiate; GA release strengthens the evidence.
- patterns/dead-letter-queue-for-invalid-records — new pattern; broker-level validation + DLQ redirect as the canonical data-quality boundary between a streaming topic and a downstream table.
- patterns/broker-native-iceberg-catalog-registration — new
pattern; the broker doing the
CREATE TABLE+ keep-table- in-sync work against an external Iceberg REST catalog, so downstream engines see the table appear and update automatically without client-side configuration.
Operational numbers¶
- Commit cadence / lag floor: explicitly tunable via "Redpanda's tunable workload management" knob; post does not publish a default value or a latency distribution.
- No throughput numbers for the Iceberg-projection path on 25.1 (not disclosed in the post; deferred to future benchmarks).
- No DLQ sizing guidance — built-in DLQ is mentioned but no retention / partition-count / monitoring recommendation.
- No schema-evolution latency impact — "seamless" is asserted but not quantified (schema registration vs Iceberg snapshot commit ordering unstated).
- No Kubernetes migration numbers — the FluxCD removal + new versioning scheme is announced but no upgrade-fleet size is disclosed.
Caveats / what this post does not cover¶
- Vendor launch framing throughout — this is a product-release post, not a retrospective or a benchmark. Every claim is phrased as a capability statement, not a measurement.
- "First in the industry" is unqualified and unverified — the post does not explicitly enumerate competing Kafka-to-Iceberg products it compares against. Credible known competitors include Apache Kafka Connect with Tabular Iceberg sink, Upsolver, and Confluent Tableflow; relative timing + multi-cloud-GA framing not compared.
- No integration with prior caveat list — the post doesn't explicitly confirm retirement of the wiki's prior-ingest caveats; the retirement is inferred from the feature list (snapshot expiry retires the GC-ownership caveat; Iceberg-spec evolution retires the schema-compat caveat).
- "Tunable workload management" is named without a specific knob name, valid range, or default — a documentation gap for the canonical GA feature.
- DLQ failure modes not walked — what happens when the DLQ itself overflows? What about schema drift in the DLQ records themselves (they still need a schema for reader tools to work)?
- Transactional writes mentioned but not specified — post does not name the isolation level, the concurrent-writer conflict resolution policy, or the recovery behaviour after a half-written commit.
- SASL/PLAIN is mentioned with a legacy-modernisation framing ("legacy systems with existing applications and CI/CD workflows") — but no warning that PLAIN over non-TLS is a security hazard; post does mention "plaintext credentials over TLS" as the intended mode.
- Protobuf normalization is mentioned without walking any actual syntactic-variation examples; "logically equivalent Protobuf variations" is hand-waved.
- Unified identity is announced but the pre-25.1 dual-identity hazard ("access drift between the UI and API surfaces") is asserted, not measured.
- K8s versioning — FluxCD removal is framed as reducing conflict risk with customer FluxCD, not as a robustness or maintenance- overhead decision; the trade-offs the FluxCD dependency was providing originally are not discussed.
- R1 engine (Redpanda's broader "multi-modal streaming data engine" framing from 2025-01-21) is not mentioned by name in this post; the Iceberg Topics pitch stands alone here.
Scope disposition¶
Tier-3 borderline-on-scope. Redpanda is a Tier-3 source (stricter content filter per AGENTS.md). This is a product-launch post, not a retrospective — but the GA feature disclosure is substantive architectural content: the four table-management properties (partitioning transforms, DLQ, schema evolution, snapshot expiry) and the five catalog-integration properties (REST sync, transactional writes, auto-discovery, object-store fallback, tunable workload management) together retire three caveats on prior wiki pages about Iceberg Topics' feature surface. Architecture density ~40% on ~1,900-word body — the Iceberg Topics section is the architecture-dense core; the other 25.1 features (SASL/PLAIN, Protobuf normalization, consumer-group lag metrics, unified identity, FluxCD removal) are brief capability-statements. Passes on GA-feature-disclosure grounds (every named property updates or retires a prior caveat) + novel-vocabulary grounds (snapshot expiry + REST catalog sync + DLQ-for-invalid-records were all gaps the 2025-01-21 ingest flagged). Fails on production-numbers grounds (no benchmarks, no customer case study, no operational numbers beyond the explicit "coming soon" on Redpanda Cloud).
Source¶
- Original: https://www.redpanda.com/blog/redpanda-25-1-iceberg-topics-ga
- Raw markdown:
raw/redpanda/2025-04-07-redpanda-251-iceberg-topics-now-generally-available-b9f66a6e.md
Related¶
- systems/redpanda-iceberg-topics — the GA-promoted feature.
- systems/redpanda — 25.1 release.
- systems/apache-iceberg — the target table format.
- concepts/iceberg-topic — the concept this source reinforces.
- concepts/iceberg-catalog-rest-sync · concepts/iceberg-snapshot-expiry · concepts/kafka-consumer-lag-metric — new concepts canonicalised by this source.
- patterns/streaming-broker-as-lakehouse-bronze-sink · patterns/dead-letter-queue-for-invalid-records · patterns/broker-native-iceberg-catalog-registration — patterns instantiated by the GA feature set.
- sources/2025-01-21-redpanda-implementing-the-medallion-architecture-with-redpanda — sibling pedagogy ingest from 2025-01 that established the Iceberg-Topics-as-Bronze-tier framing; this post is the GA follow-up with the disclosed feature surface.
- companies/redpanda — vendor.