Skip to content

SYSTEM Cited by 1 source

Redpanda Connect Iceberg output

The iceberg output connector for Redpanda Connect is a declarative sink that writes streaming data to Apache Iceberg tables from a YAML pipeline, speaking the Iceberg REST Catalog API. Shipped in Redpanda Connect v4.80.0 (enterprise-gated) per the 2026-03-05 launch post.

The Iceberg output is the non-Kafka-source companion to the pre-existing broker-native Redpanda Iceberg Topics feature — filling the gap where data arrives from HTTP webhooks, Postgres CDC, GCP Pub/Sub, or any of Redpanda Connect's 300+ input connectors, and where in-stream transformations (PII stripping, flattening, type routing) are needed before landing in the lakehouse.

Source: sources/2026-03-05-redpanda-introducing-iceberg-output-for-redpanda-connect.

Why it exists (vs Iceberg Topics)

Iceberg Topics is the zero-ETL Kafka-protocol → Iceberg table path: any Kafka client produces, and the broker materialises the topic as an Iceberg table transparently. Optimal for high-throughput Kafka-native workloads with Schema-Registry-driven schema evolution and a 1-topic → 1-table shape.

Gaps Iceberg Topics doesn't fill (which the Iceberg output addresses):

  • Data arriving from non-Kafka sources — HTTP webhooks, Postgres CDC, GCP Pub/Sub, MongoDB change streams, CloudWatch logs, file-based inputs — is the primary motivating case cited in the launch post.
  • In-stream transformation — normalising payloads, dropping PII, splitting a mixed event stream by type before landing.
  • Multi-table routing from one pipeline — the broker-native Iceberg Topics is 1-topic → 1-table; the Iceberg output supports one pipeline fanning to N tables via Bloblang-interpolated table and namespace fields.
  • Registry-less, data-driven schema evolution — infers table schema from raw JSON without a Schema Registry. Iceberg Topics is registry-driven (Avro/Protobuf/JSON schemas).

Canonical architectural pattern canonicalised on the wiki as patterns/sink-connector-as-complement-to-broker-native-integration: a product ships both a broker-native integration (zero-ETL for its own protocol) and a sink connector (flexibility + heterogeneous sources + in-stream transformation), each covering a shape the other doesn't.

Key properties

Registry-less, data-driven schema evolution

The connector infers schema from incoming JSON records and updates the Iceberg table metadata as new fields appear. Verbatim: "The Iceberg output also uses schema evolution to sense new fields in an incoming JSON stream and automatically updates the Iceberg table metadata. No manual DDL, no registry required, and no ticket for the ops team every time an app update adds a column." Canonicalised on the wiki as concepts/registry-less-schema-evolution.

The "best of both worlds" framing verbatim: "the flexibility of raw JSON with the precision of a structured lakehouse" — inverting two named pathologies: (1) Schema-Registry-less Kafka Connect SMT chains ("maintenance toil"), (2) the all-columns-land-as-string default seen on other registry-less connectors ("dirty data").

Mechanism depth undisclosed: the type-inference strategy, type-conflict handling across records, column-rename / type-widening support, and deleted-field semantics are all elided.

Data-driven flushing

The connector flushes to object storage only when data is present — not on a fixed timer. Verbatim: "Unlike legacy connectors that heartbeat on a fixed timer regardless of activity, Redpanda Connect uses data-driven flushing. It only executes a flush operation when there is actual data to move, preventing the 'small file problem' on object storage and ensuring you aren't wasting compute cycles on empty operations." Canonicalised on the wiki as concepts/data-driven-flushing, with concepts/small-file-problem-on-object-storage as the pathology it mitigates.

Bloblang-interpolated multi-table routing

Both table and namespace config fields support Bloblang interpolation, so a single pipeline definition routes messages to N tables based on message content. Worked example verbatim: table: 'events_${!this.event_type}'. Canonicalised on the wiki as patterns/bloblang-interpolated-multi-table-routing. Inverts the "configuration hell" of traditional connectors that require rigid per-table static mappings.

Iceberg REST Catalog API

The connector speaks the standard Iceberg REST Catalog protocol, so it integrates with any REST-speaking catalog. Named targets in the launch post:

OAuth2 token exchange + per-tenant isolation

The connector fits into pre-existing OAuth2 token exchange + per-tenant REST catalog (like Polaris) workflows out of the box. At ~0.1 vCPU per pipeline density (operational-shape claim, not a fleet number), isolated per-tenant pipelines are practical without substantial cloud overhead.

Deployment shape

Stateless container on Kubernetes. No persistent local state — Iceberg snapshot state lives in the REST catalog + object storage. Contrast with the in-broker Iceberg Topics feature, which has zero-extra-component deployment but couples to the Redpanda broker lifecycle.

Comparison to Iceberg Topics (verbatim table from launch post)

Feature Redpanda Iceberg Topics (in-broker) Redpanda Connect Iceberg output (sink connector)
Primary value Zero-ETL convenience. Automatically write streams to tables. Integration flexibility. Route, transform, and automate in-stream before landing to tables.
Best for High-throughput, standard Kafka-to-lakehouse. Complex pipelines, non-Kafka sources, and "set-and-forget" schemas.
Data sources Redpanda Streaming topics only. Hundreds of sources (HTTP, CDC, SQS, Kinesis, etc.).
Schema evolution Registry-Driven. Evolves automatically as you update Avro/Protobuf/JSON schemas in the Schema Registry. Data-Driven. Table structure can evolve automatically from raw JSON—no registry required.
Routing Optimized for 1 topic → 1 table. Multi-table. Route to many tables from one stream.
Infrastructure Zero extra components. Stateless container (stateless pipeline) on K8s.
Availability Redpanda Cloud BYOC or Self-Managed Enterprise Edition. Enterprise tier connector for Redpanda Connect (requires a license).

Limitations (v4.80.0)

  • Append-only at launch. Upserts on roadmap. Material scope limit for CDC UPDATE/DELETE workloads — Postgres CDC feeding DELETE events cannot cleanly land through this connector in v4.80.0 without additional processor-layer handling.
  • Schema-inference mechanism undisclosed — type inference strategy, conflict handling, rename / widening support all undocumented in the launch post.
  • No benchmarks against Kafka Connect Iceberg Sink or Databricks / Tabular Iceberg sinks.
  • Commit-tuning axis (Iceberg snapshot commit cadence vs small-file count trade-off) name-checked in passing but not walked.
  • Enterprise-gated license — contrast with the 2025-06-17 dynamic-plugins launch which was Apache 2.0.

Example pipeline (verbatim from launch post)

input:
  redpanda:
    seed_brokers: ["${REDPANDA_BROKERS}"]
    topics: ["events"]
    consumer_group: "iceberg-sink"

pipeline:
  processors:
    - mapping: |
        root = this
        root.ingested_at = now()

output:
  iceberg:
    catalog:
      url: https://polaris.example.com/api/catalog
      warehouse: analytics
      auth:
        oauth2:
          client_id: "${CATALOG_CLIENT_ID}"
          client_secret: "${CATALOG_CLIENT_SECRET}"
    namespace: raw.events
    table: 'events_${!this.event_type}'
    storage:
      aws_s3:
        bucket: my-iceberg-data
        region: us-west-2

Seen in

Last updated · 470 distilled / 1,213 read