Redpanda — Introducing Iceberg output for Redpanda Connect¶
Unsigned Redpanda launch post (~1,000 words, 2026-03-05) announcing the
iceberg output connector for Redpanda
Connect (shipped in Redpanda Connect v4.80.0, enterprise
license). A declarative Apache Iceberg sink
that writes streaming data directly to Iceberg tables from a YAML
pipeline, using the Iceberg REST Catalog API. Positioned as the
non-Kafka-source companion to the pre-existing broker-native
Redpanda Iceberg Topics feature —
different tool for different shapes.
Summary¶
The post walks a single motivating gap: Iceberg Topics gives zero-ETL
from Kafka protocol → Iceberg table, but customers with non-Kafka
sources (HTTP webhooks, Postgres CDC, GCP Pub/Sub) or who need
in-stream transformations (PII stripping, flattening, type routing)
needed an alternative. The iceberg output fills that gap, plugging
into Redpanda Connect's "300+ inputs and processors" ecosystem.
Three architectural properties are load-bearing:
- Registry-less, data-driven schema evolution — the connector senses new fields in raw JSON and auto-updates the Iceberg table metadata; no Schema Registry required; no manual DDL.
- Data-driven flushing (explicit inversion of timer-driven flushing) — flush only when data is present, avoiding the small-file problem on object storage and idle compute waste on quiet sources.
- Bloblang-interpolated multi-table routing from a single pipeline
—
tableandnamespacefields support Bloblang interpolation (e.g.'events_${!this.event_type}'), so one pipeline definition routes messages to N tables based on message content, displacing "configuration hell" of per-table static mappings.
The connector speaks the Iceberg REST Catalog API and integrates with Apache Polaris™, AWS Glue, Databricks Unity Catalog, Snowflake Open Catalog, GCP BigLake, or any REST-speaking catalog.
Key takeaways¶
-
Non-Kafka sources are the filled gap. Verbatim: "But maybe your data arrives from an HTTP webhook, a Postgres CDC stream, or a GCP Pub/Sub subscription. Maybe you need to normalize a payload, drop PII, or split a mixed event stream by type before anything hits the lakehouse. That's the gap this connector fills." The Iceberg output is explicitly positioned against Iceberg Topics' zero-ETL broker-to-table path, not as a replacement.
-
Two-shape comparison table canonicalised verbatim — Iceberg Topics (in-broker, registry-driven, 1 topic → 1 table, zero extra components, Redpanda Cloud BYOC or Self-Managed EE) vs Iceberg output (stateless K8s sink, data-driven schema, multi-table routing, hundreds of non-Kafka sources, Redpanda Connect Enterprise tier). "Primary value: Zero-ETL convenience vs Integration flexibility."
-
Registry-less schema evolution as a first-class feature. Verbatim: "The Iceberg output also uses schema evolution to sense new fields in an incoming JSON stream and automatically updates the Iceberg table metadata. No manual DDL, no registry required, and no ticket for the ops team every time an app update adds a column." Trade-off framing verbatim: "while other connectors can technically evolve a schema, doing so without a schema registry usually forces you into 'maintenance toil' (chaining brittle Kafka Connect SMTs) or leaves you with 'dirty data' (where all columns land as string data types). Redpanda Connect gives you the best of both worlds: the flexibility of raw JSON with the precision of a structured lakehouse." Canonicalised on the wiki as concepts/registry-less-schema-evolution, a fifth axis on concepts/schema-evolution.
-
Data-driven flushing as the small-file-problem mitigation. Verbatim: "Unlike legacy connectors that heartbeat on a fixed timer regardless of activity, Redpanda Connect uses data-driven flushing. It only executes a flush operation when there is actual data to move, preventing the 'small file problem' on object storage and ensuring you aren't wasting compute cycles on empty operations." Canonicalised on the wiki as concepts/data-driven-flushing — the inversion of the timer-based flush common to Kafka Connect-era sinks. Related wiki substrate: concepts/small-file-problem-on-object-storage (new).
-
Bloblang-interpolated multi-table routing from one pipeline. The
tableandnamespaceconfig fields are Bloblang-interpolated — a single pipeline routes messages across N tables based on message content. Worked example verbatim:table: 'events_${!this.event_type}'. Canonicalised on the wiki as patterns/bloblang-interpolated-multi-table-routing. Trade-off framed against "configuration hell" of traditional connectors that need rigid per-table mappings. -
Iceberg REST Catalog API is the integration surface. Lists Apache Polaris, AWS Glue, Databricks Unity Catalog, Snowflake Open Catalog, GCP BigLake as supported catalogs. Adds one catalog-specific worked example — Polaris with OAuth2 client-credentials — to the wiki's canonical Iceberg catalog REST sync substrate.
-
OAuth2 token exchange + per-tenant REST catalog as the enterprise-isolation substrate. Verbatim: "Redpanda Connect fits into your existing OAuth2 token exchange and per-tenant REST catalog (like Polaris) workflows out of the box. And because Redpanda Connect is so lightweight (runs as low as 0.1 vCPU), you can deploy isolated, high-density pipelines for every tenant or department without blowing your cloud budget." 0.1 vCPU per-pipeline density is the operational-shape claim; no fleet numbers.
-
Append-only only at launch; upserts on the roadmap. Verbatim: "This initial release focuses on high-speed append-only ingestion (with upserts on the roadmap)." A material scope limit for CDC workloads — Postgres CDC feeding UPDATE/DELETE operations cannot cleanly land through this connector in v4.80.0.
Systems extracted¶
- systems/redpanda-connect — Redpanda Connect v4.80.0 is the launch vehicle. Adds Iceberg output to the ~300 existing outputs.
- systems/redpanda-connect-iceberg-output — new canonical system page covering the Iceberg output connector specifically (distinct from the broker-native Iceberg Topics feature).
- systems/apache-iceberg — the target table format.
- systems/apache-polaris — catalog used in worked example.
- systems/unity-catalog, systems/aws-glue, systems/google-biglake — additional supported REST catalogs (all already canonicalised on the wiki from prior Redpanda + Databricks ingests).
- systems/redpanda-iceberg-topics — the broker-native counterpart the Iceberg output complements (not replaces).
Concepts extracted¶
- concepts/iceberg-catalog-rest-sync — extended with the sink-connector-altitude instance (prior instances were broker-native).
- concepts/schema-evolution — extended with the registry-less / data-driven axis.
- concepts/registry-less-schema-evolution (new) — the property of evolving an Iceberg table's schema from raw JSON without a Schema Registry, framed as the "best of both worlds" between brittle SMT chains and dirty all-string tables.
- concepts/data-driven-flushing (new) — flush-on-data-present rather than heartbeat-on-timer. Mitigates the small-file problem on object storage.
- concepts/small-file-problem-on-object-storage (new) — the pathology that small, frequently-flushed files on object storage create: metadata bloat, read-amp during scan, per-file listing cost.
- concepts/bloblang (new) — Redpanda Connect's declarative mapping language, previously referenced implicitly; canonicalised here as the mechanism behind multi-table routing and in-stream reshaping.
Patterns extracted¶
- patterns/streaming-broker-as-lakehouse-bronze-sink — extended with the sink-connector-altitude variant (prior instances were broker-native).
- patterns/broker-native-iceberg-catalog-registration — extended with a sink-connector counterpart framing.
- patterns/bloblang-interpolated-multi-table-routing (new) —
single YAML pipeline routes messages to N tables via Bloblang
interpolation of
table/namespaceconfig fields. Alternative to static per-table mappings common to Kafka Connect sinks. - patterns/sink-connector-as-complement-to-broker-native-integration (new) — architectural pattern where a product ships both a broker-native integration (zero-ETL from its own protocol) and a sink-connector (flexibility + heterogeneous sources), each covering a shape the other doesn't.
Operational numbers¶
- Redpanda Connect v4.80.0 — the release shipping the Iceberg output.
- 0.1 vCPU per-pipeline lower bound cited for high-density per-tenant deployment.
- No throughput numbers, no latency numbers, no fleet numbers, no case studies.
Caveats¶
- Launch-post voice — "Today we're announcing…" opener, "Suffer no more with Redpanda Connect!" marketing register. Zero production incidents, zero customer case studies, zero quantitative disclosures beyond the 0.1 vCPU density datum.
- Append-only at launch — upserts on roadmap. Material scope limit for CDC UPDATE/DELETE workloads in v4.80.0.
- Schema-evolution mechanism depth not disclosed — the post says the connector "senses new fields in an incoming JSON stream and automatically updates the Iceberg table metadata", but doesn't disclose: how type inference is done from raw JSON (string vs number vs nested-object); what happens on type conflicts across records (coerce? quarantine? error?); whether column renames or type-widening are supported; whether deleted fields leave tombstone columns. The "best of both worlds" claim is asserted, not mechanism-shown.
- Data-driven flushing mechanism depth not disclosed — the trigger shape (per-record? per-batch? watermark-based?), flush interval bounds, and interaction with Iceberg snapshot cadence are all elided.
- No benchmark against the "legacy connectors" it foils (Kafka Connect Iceberg Sink, Tabular / Databricks Iceberg sinks).
- No discussion of commit-tuning trade-offs — Iceberg snapshot commits on object storage have a per-commit overhead; commit frequency vs small-file tradeoff is a real operational axis and the post name-checks "commit tuning" only as a docs-reference in passing.
- Enterprise-gated — requires Redpanda Connect Enterprise tier license; Apache 2.0 Redpanda Connect core users cannot access this connector. Contrast with the 2025-06-17 dynamic-plugins launch which was Apache 2.0.
- Unsigned (Redpanda default attribution).
- Partition spec expressions named as a configurable feature but
not walked in the post — only the flat
events_${!this.event_type}table-routing example is shown.
Cross-source continuity¶
-
Companion to sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available|2025-04-07 Iceberg Topics GA — Iceberg Topics is the broker-native, registry-driven, Kafka-only zero-ETL path; Iceberg output is the sink-connector, data-driven, source-polymorphic alternative. Together the two shapes bracket Redpanda's Iceberg story: one pipeline for Kafka-producer → Iceberg table (Iceberg Topics), another for non-Kafka source → transform → Iceberg table (Iceberg output). Explicit architectural distinction canonicalised on the wiki via patterns/sink-connector-as-complement-to-broker-native-integration.
-
Sequel to sources/2025-05-13-redpanda-getting-started-with-iceberg-topics-on-redpanda-byoc|2025-05-13 Iceberg Topics on BYOC — that post canonicalised BYOC-data-ownership for Iceberg; this post adds the sink-connector-altitude integration surface that complements it for non-Kafka source data landing in the same customer-owned bucket.
-
Sequel to sources/2026-01-06-redpanda-build-a-real-time-lakehouse-architecture-with-redpanda-and-databricks|2026-01-06 joint Databricks lakehouse post — that post framed Unity Catalog as the governance hub for Iceberg Topics; this post names Unity Catalog + Polaris + Glue + Snowflake Open Catalog + BigLake as a single five-catalog matrix the Iceberg output speaks interchangeably via the REST API.
-
Companion to sources/2025-11-06-redpanda-253-delivers-near-instant-disaster-recovery-and-more|2025-11-06 Redpanda 25.3 release preview — that post added BigLake as the fourth managed REST catalog for Iceberg Topics; this post covers the catalog-integration matrix from the sink-connector side.
-
Companion to sources/2025-03-18-redpanda-3-powerful-connectors-for-real-time-change-data-capture|2025-03-18 CDC connectors post — that post canonicalised
postgres_cdc/mysql_cdc/mongodb_cdc/gcp_spanner_cdcas the source-connector family in Redpanda Connect; this post adds the Iceberg output as a potential terminal sink those CDC streams can feed — with the append-only caveat that UPDATE/DELETE events won't land cleanly until upserts ship. -
Companion to sources/2025-12-09-redpanda-streaming-iot-and-event-data-into-snowflake-and-clickhouse|2025-12-09 Snowflake + ClickHouse dual-tier — that post canonicalised the generic
sql_raw/sql_insertapproach to Snowflake/ClickHouse sinks; this post adds the first-class Iceberg sink surface. -
Companion to sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming|2025-10-02 Snowflake Streaming benchmark — that post benchmarked the
snowflake_streamingoutput connector (14.5 GB/s sustained); this post adds theicebergoutput connector for Iceberg-catalog-backed alternatives.
No existing-claim contradictions — the post is strictly additive.
Example pipeline (verbatim from post)¶
input:
redpanda:
seed_brokers: ["${REDPANDA_BROKERS}"]
topics: ["events"]
consumer_group: "iceberg-sink"
pipeline:
processors:
- mapping: |
root = this
root.ingested_at = now()
output:
iceberg:
catalog:
url: https://polaris.example.com/api/catalog
warehouse: analytics
auth:
oauth2:
client_id: "${CATALOG_CLIENT_ID}"
client_secret: "${CATALOG_CLIENT_SECRET}"
namespace: raw.events
table: 'events_${!this.event_type}'
storage:
aws_s3:
bucket: my-iceberg-data
region: us-west-2
Note the namespace + table Bloblang interpolation (literal string
raw.events for namespace, templated 'events_${!this.event_type}'
for table). The in-pipeline mapping processor adds an
ingested_at timestamp — exemplifying the in-stream-transformation
value proposition before records land in Iceberg.
Source¶
- Original: https://www.redpanda.com/blog/redpanda-connect-apache-iceberg-output
- Raw markdown:
raw/redpanda/2026-03-05-introducing-iceberg-output-for-redpanda-connect-5974de86.md
Related¶
- systems/redpanda-connect
- systems/redpanda-connect-iceberg-output — new sink connector system page
- systems/apache-iceberg
- systems/redpanda-iceberg-topics — broker-native counterpart
- systems/apache-polaris, systems/unity-catalog, systems/aws-glue, systems/google-biglake
- concepts/iceberg-catalog-rest-sync
- concepts/schema-evolution
- concepts/registry-less-schema-evolution — new
- concepts/data-driven-flushing — new
- concepts/small-file-problem-on-object-storage — new
- concepts/bloblang — new
- patterns/streaming-broker-as-lakehouse-bronze-sink
- patterns/broker-native-iceberg-catalog-registration
- patterns/bloblang-interpolated-multi-table-routing — new
- patterns/sink-connector-as-complement-to-broker-native-integration — new
- companies/redpanda