Skip to content

SYSTEM Cited by 10 sources

Redpanda Connect

Redpanda Connect is Redpanda's Kafka-Connect alternative: a hundreds-of-connectors integration layer that feeds data into and out of the Redpanda broker (or any Kafka-API-compatible broker) without running a separate Kafka Connect cluster. Originally developed as the open-source Benthos project (joined Redpanda in 2023), it is positioned as a "fresh alternative to Kafka Connect that's more flexible, scalable, and simpler to deploy".

Stub page — expand on future Redpanda Connect internals sources. Canonical wiki use case is the four CDC input connectors shipped as the flagship source class, introduced in the 2025-03-18 blog post.

CDC input connectors (2025-03)

Four per-database-engine change-data-capture inputs, each riding on the engine's native change log:

  • postgres_cdc — rides on PostgreSQL logical replication; a replication slot exports a consistent snapshot and marks the LSN boundary for transition to the streaming phase. Supports parallel snapshot of large tables — the Redpanda differentiator against Debezium.
  • mysql_cdc — rides on MySQL binlog; takes a global read lock to capture initial snapshot + binlog position, then releases to stream. Requires an external offset store (Redis, SQL database, any datastore) for binlog-position durability. Topology scope limited at 2025-03: "standard MySQL setups and primary-replica configurations, with plans to extend support for high-availability clusters and Global Transaction ID (GTID) environments". No GTID support at time of publication — a significant gap vs Debezium's MySQL connector.
  • mongodb_cdc — rides on MongoDB change streams / oplog; supports parallel snapshot by splitting collections into chunks, and offers "flexible document modes" for update/delete handling (full-document lookups, pre/post image capture). Also requires an external offset store for oplog positions.
  • gcp_spanner_cdc — rides on Google Cloud Spanner change streams; stores progress "transactionally in a configurable spanner table for at least once delivery". Automatically processes partitions that are merged and split — the dynamic-partition-topology property is structurally distinct from the static-shape of the other three engines.

(Source: sources/2025-03-18-redpanda-3-powerful-connectors-for-real-time-change-data-capture)

Microsoft SQL Server CDC (Redpanda Connect 4.67.5, 2025-11-06)

Redpanda Connect 4.67.5 (shipped alongside Redpanda 25.3) introduces a fifth per-engine CDC input in the Redpanda Connect family: microsoft_sql_server_cdc — riding on SQL Server's native change tables mechanism.

Verbatim from the sources/2025-11-06-redpanda-253-delivers-near-instant-disaster-recovery-and-more|25.3 launch post:

"Using dedicated change tables, the connector non-invasively captures every single insert, update, and delete from your SQL Server tables in real time and streams them into Redpanda with minimal impact on the source database's performance."

Completes the CDC family to five source-database engines (Postgres / MySQL / MongoDB / Spanner / SQL Server). Fits the same CDC driver ecosystem framing as the prior four.

Vendor benchmark disclosed in the launch post:

  • Redpanda Connect MSSQL CDC: ~40 MB/s ingest, 3:15 initial snapshot on a 5M-row table.
  • Unnamed "alternative hosted Kafka + CDC service": ~14.5 MB/s, 8:04 initial snapshot.
  • Test hardware: 4 vCPUs (1 logical core), 16 GB memory Azure instance; SQL Server co-located.

Enterprise-gated. Not yet disclosed: Always On AG / mirroring / log shipping topology support, parallel-snapshot capability. See systems/redpanda-connect-mssql-cdc for the full system page.

Oracle CDC (Redpanda Connect 4.83.0, 2026-04-09)

Redpanda Connect 4.83.0 (shipped 2026-04-09) introduces the sixth per-engine CDC input in the Redpanda Connect family: oracledb_cdc — riding on Oracle LogMiner, the Oracle Enterprise Edition redo-log-mining utility.

Verbatim from the launch post:

"Starting with Redpanda Connect v4.83.0, the oracledb_cdc input captures changes directly from Oracle, including: inserts, updates, and deletes. The connector then routes them downstream as structured events. No JVM, no Kafka Connect cluster, no separate workers. Just Redpanda Connect doing what it does best."

Completes the CDC family to six source-database engines: Postgres / MySQL / MongoDB / Spanner / MSSQL / Oracle.

Three architectural properties canonicalised in the launch post:

  1. LogMiner as the change-capture substrate. Rides on Oracle Enterprise Edition's built-in LogMiner utility; no additional Oracle licensing required (contrast Oracle GoldenGate).
  2. In-source checkpointing. Progress is stored in a checkpoint table inside Oracle itself — no external Redis or SQL offset store required. Canonicalised as concepts/in-source-cdc-checkpointing, the fourth offset- durability class across the CDC family.
  3. Automatic schema tracking via ALL_TAB_COLUMNS. Queries Oracle's data dictionary for precision-aware column metadata; emits integer columns (NUMBER(p, 0)) as int64 and decimal columns (NUMBER(p, s) with s > 0) as json.Number. Composes with schema_registry_encode for typed Avro encoding. New columns detected mid-stream; dropped columns reflected after connector restart. Canonicalised as concepts/precision-aware-type-mapping.

Oracle Wallet supported via the wallet_path config field for regulated environments — canonicalised as the wiki's first instance of concepts/file-based-credential-store. Two wallet formats: cwallet.sso (auto-login, no password) and ewallet.p12 (PKCS#12, password via wallet_password config field, which is redacted from logs and config dumps). SSL enabled automatically.

Multi-table routing via Bloblang interpolation on the output: topic: ${! meta("table_name").lowercase() } routes each CDC event to its own per-table Kafka topic. Canonical second-instance of the same pattern (first being the 2026-03-05 Iceberg-output sink-side instance).

Enterprise-gated. See systems/redpanda-connect-oracle-cdc for the full connector system page.

Canonical differentiator vs Debezium

The 2025-03-18 post's load-bearing competitive claim verbatim:

"Redpanda's PostgreSQL and MongoDB CDC connectors can also parallelise reads for large tables. That means tables or collections with millions of records can be split into smaller chunks and read in parallel. Debezium (Kafka Connect) does not do this today."

Other CDC tools (including Debezium) parallelise across tables or collections — one stream per table. Redpanda Connect's Postgres and MongoDB connectors parallelise within a single table or collection — chunks read concurrently during the snapshot phase. Canonicalised on the wiki as concepts/parallel-snapshot-cdc.

Chunk-splitting algorithm not disclosed in the post. Open questions: how are chunk boundaries picked (primary key range? internal page ranges?)? Is the snapshot consistency preserved across parallel readers (single transaction? independent transactions with reconciled boundary)?

Positioning vs Kafka Connect

Redpanda Connect is marketed as the in-Redpanda-product integration layer — no separate connector cluster, no separate operator, no separate offset topic. The Debezium + Kafka Connect CDC pipeline shape is the competitive reference: Debezium ships as source connectors running on a Kafka Connect cluster, with offsets stored in Kafka topics, feeding Kafka topics as output. Redpanda Connect collapses this by running connectors as first-class Redpanda components.

Per-engine connector family structure mirrors Debezium — canonical CDC driver ecosystem instance at the Redpanda-ecosystem altitude.

MCP tool surface via rpk connect mcp-server (2025-04)

As of the 2025-04-03 Redpanda Agents SDK launch, Redpanda Connect doubles as the tool-surface layer for enterprise AI agents: the rpk connect mcp-server subcommand exposes any of the ~300 pre-built connectors as a Model Context Protocol tool with a simple configuration change (see info.csv for the full connector catalog). The canonical instantiation of MCP as centralized integration proxy.

Two programming surfaces are load-bearing in the MCP-server role:

  • Bloblang — Redpanda Connect's declarative per-field mapping / filtering / transformation language. Content filtering at the MCP-tool call shape is authored in Bloblang.
  • Starlark — a Python subset embedded as a code-extension language for cases where Bloblang isn't expressive enough. "Effectively Python without imports, but more importantly, it is all Python so no need to learn a new configuration language." Also usable as a replacement for authoring Redpanda Connect YAML (Python-native pipeline config).

The combined shape — Bloblang declarative + Starlark escape hatch — is the mechanism behind dynamic content filtering in the MCP pipeline, the fine-grain-ACL future Alex Gallego describes in the 2025-04-03 post:

"For Redpanda Connect specifically, it is the ability to leverage full programming languages via custom code extensions to give engineers the speed of iteration while letting the security team sleep at night, knowing they can enforce overriding global policies for ultra-fine-grain access to any of the ~300 connectors in a declarative fashion." (Source: sources/2025-04-03-redpanda-autonomy-is-the-future-of-infrastructure)

Dynamic plugins (2025-06, Beta, Apache 2.0)

Redpanda Connect v4.56.0 (2025-06-17) introduced dynamic plugins — a Beta, Apache-2.0-licensed plugin framework that escapes the previous "compiled plugins" model (Go-only, built into the binary). Dynamic plugins are external executables launched as subprocesses, communicating with the host Redpanda Connect process over gRPC on a Unix domain socket. The gRPC service "closely mirrors the existing interfaces defined for plugins within Redpanda Connect's core engine, Benthos" (Source: sources/2025-06-17-redpanda-introducing-multi-language-dynamic-plugins-for-redpanda-connect).

Three new built-in plugins — BatchInput, BatchProcessor, BatchOutput — ship in the host binary as dispatch shims that load and communicate with external plugin executables. Only batch component types are exposed across the boundary: batch-only by design, to amortize the cross-process IPC cost. Go and Python SDKs ship at launch; any gRPC-capable language could write its own.

Two structural properties the post calls out verbatim:

  • Crash containment. "Plugins run in separate processes, so crashes won't take down the main Redpanda Connect engine." Canonicalized as concepts/subprocess-plugin-isolation.
  • Language agnosticism. "Write plugins in virtually any language that supports gRPC."

Explicit architectural guidance positions dynamic plugins as additive, not a replacement:

"For performance-critical workloads where every microsecond counts, the best approach remains using native Go plugins compiled directly into the Redpanda Connect binary. Dynamic plugins shine for flexibility and language choice, while compiled plugins offer maximum performance." (Source: sources/2025-06-17-redpanda-introducing-multi-language-dynamic-plugins-for-redpanda-connect)

Canonicalized as [[patterns/compiled-vs-dynamic-plugin- tradeoff]] and as an instance of [[patterns/grpc-over-unix- socket-language-agnostic-plugin]]. The headline language target is Python — explicitly bridging the streaming substrate to the PyTorch / TensorFlow / Hugging Face / LangChain ecosystem — for which the motivating use case is a Python processor plugin running a pre-trained BERT model for sentiment analysis on streaming customer feedback.

See systems/redpanda-connect-dynamic-plugins for the full architecture, developer surface, and known gaps.

Snowpipe Streaming output connector (2025-10 benchmark)

The snowflake_streaming output connector — based on Snowflake's Snowpipe Streaming API — is the Redpanda-side surface for landing streaming data into Snowflake tables at low latency. Disclosed in the [[sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming|2025-10-02 Redpanda+Snowflake benchmark]]: 3.8 billion 1 KB AVRO messages at 14.5 GB/s into a single Snowflake table with P50 ≈ 2.18 s / P99 ≈ 7.49 s end-to-end latency — exceeds Snowflake's documented 10 GB/s per-table ceiling by 45%.

Three canonical tuning knobs disclosed in the benchmark:

The benchmark's decisive scaling dimension was intra-node input/output parallelism via the broker primitive — running many parallel kafka_franzsnowflake_streaming pipelines within a single Connect process to fully saturate the 48-core nodes. Canonicalised as patterns/intra-node-parallelism-via-input-output-scaling.

86% of the P99 end-to-end latency (~6.44 s of 7.49 s) lives in the Snowflake upload/register/commit path — the analytical-sink commit is the dominant latency contributor, not the broker read or transport hop. Public- internet transport in the benchmark; AWS PrivateLink would reduce it further.

Iceberg output connector (v4.80.0, 2026-03-05, Enterprise)

Redpanda Connect v4.80.0 (shipped 2026-03-05, enterprise- gated) introduces the iceberg output — a declarative sink that writes streaming data to Apache Iceberg tables via the Iceberg REST Catalog API. Positioned as the non-Kafka- source companion to the broker-native Redpanda Iceberg Topics feature: zero-ETL Kafka → Iceberg is Iceberg Topics; sink- connector with 300+ upstream sources + in-stream transforms + multi-table routing is the Iceberg output. Canonicalised on the wiki as patterns/sink-connector-as-complement-to-broker-native-integration.

Three architectural properties are load-bearing (Source: sources/2026-03-05-redpanda-introducing-iceberg-output-for-redpanda-connect):

  1. Registry-less, data-driven schema evolution — infers table schema from raw JSON; no Schema Registry required. Framed verbatim as the "best of both worlds" between chained SMT brittleness ("maintenance toil") and all-string dirty-data tables.
  2. Data-driven flushing — flush only when data is present; inverts the Kafka-Connect- era timer-driven default. Mitigates the small-file problem on object storage and quiet-source compute waste.
  3. Bloblang- interpolated multi-table routingtable and namespace config fields support Bloblang interpolation ('events_${!this.event_type}'). One pipeline definition, N destination tables.

The connector speaks any REST-compliant Iceberg catalog — Apache Polaris, AWS Glue, Unity Catalog, GCP BigLake, Snowflake Open Catalog. OAuth2 client-credentials is the idiomatic auth; per-tenant REST catalog isolation is explicitly supported.

Scope limitations (v4.80.0): append-only only at launch (upserts on roadmap — material for CDC UPDATE/DELETE workloads); schema-inference mechanism depth undisclosed; enterprise-gated license (contrast: 2025-06-17 dynamic-plugins launch was Apache 2.0); no published benchmarks. Two-shape comparison against Iceberg Topics ("Zero-ETL convenience vs Integration flexibility") canonicalised in the launch post's feature matrix.

See systems/redpanda-connect-iceberg-output for the full system page.

ClickHouse output via sql_raw / sql_insert (2025-12)

There is no dedicated ClickHouse output connector in Redpanda Connect as of the 2025-12-09 disclosure. The canonical workaround is to use the generic SQL processors — sql_raw and sql_insert — against ClickHouse's SQL interface. These components are available as input, processor, and output types, giving three wiring options. Verbatim from the Redpanda blog:

"While there isn't a dedicated ClickHouse connector for Redpanda Connect yet, the sql_raw and sql_insert components allow you to stream execute commands or stream data from Redpanda into ClickHouse. They're available as input, processor, and output types, so you've got flexibility in how you wire things up." (Source: sources/2025-12-09-redpanda-streaming-iot-and-event-data-into-snowflake-and-clickhouse)

The same post names dual-write to both Snowflake and ClickHouse simultaneously as a canonical composition — pair sql_raw / sql_insert (ClickHouse leg) with the first-class snowflake_streaming (Snowflake leg), either using a broker output to fan out every record to both sinks, or multiplexing to route per-record by rule. Canonicalised as the pattern patterns/clickhouse-plus-snowflake-dual-storage-tier.

Gap vs Snowpipe Streaming: the ClickHouse leg is a generic-SQL-sink hop, not a columnar-native bulk-insert path. No per-channel exactly-once offset tokens, no schema-evolution helper, no built-in parallelism knob matching channel_prefix × max_in_flight. Downstream exactly-once on ClickHouse requires idempotent inserts (e.g. ReplacingMergeTree or UUID primary keys).

Output multiplexing and broker fan-out

The Redpanda Connect output layer supports two parallel- output shapes that compose across connectors of any kind:

  • broker — fan out the same message to every configured output. Each output gets an independent copy; a failure on one doesn't block the others. Canonical shape for "write this event to Snowflake and ClickHouse"-style dual-tier pipelines.
  • Multiplexing (output switch with per-record rules) — route each message to exactly one output based on Bloblang predicates. Canonical shape for "route logs to cold storage, errors to alerting, events to real-time"-style rule-based pipelines.

The 2025-12-09 post frames these as the two composition primitives for the ClickHouse + Snowflake dual-tier pattern; both work with the heterogeneous mix of snowflake_streaming (first-class) + sql_insert (generic).

GitOps deployment (2025-12-02 tutorial)

The 2025-12-02 Redpanda unsigned tutorial sources/2025-12-02-redpanda-operationalize-redpanda-connect-with-gitops canonicalises the end-to-end Argo CD + Helm + Kustomize deployment shape for Redpanda Connect on Kubernetes. Tutorial walks through both deployment modes side by side:

  • Standalone mode — single pipeline with config baked into Helm values.yaml. Deployed via Argo CD multi-source Application — chart from charts.redpanda.com pinned at targetRevision: 3.1.0, values from the customer's own repo referenced via $values/standalone/standalone-mode.yaml. Scaling is deployment.replicaCount via Git commit.

  • Streams mode — multiple pipelines loaded from Kubernetes ConfigMaps at runtime. Deployed via Kustomize wrapping the Helm chartconfigMapGenerator turns each pipeline YAML into a hash-suffixed ConfigMap; helmCharts inflates the chart inline; precondition is kustomize.buildOptions: --enable-helm --load-restrictor LoadRestrictionsNone on Argo CD. Config changes trigger rolling restart via ConfigMap hash rollout.

The Streams-mode REST API (/version, /ready, /streams, /metrics) introduces the runtime-API vs GitOps source-of-truth tension — GitOps-compatible when automation derives state from Git, anti-pattern when humans mutate pipelines through the API without Git commit.

Observability deployed as parallel Argo CD Application — kube-prometheus-stack (Prometheus + Alertmanager + Grafana + K8s dashboards) + Prometheus service monitor + Redpanda Connect Grafana dashboard. Redpanda Connect exposes Prometheus-compatible metrics natively "without custom exporters or sidecars". Companion GitHub repo: redpanda-data-blog/redpanda-connect-the-gitops-way.

Licensing

At 2025-03 the four CDC input connectors are gated behind an Enterprise license in Redpanda Cloud and Self-Managed deployments. Not a free-tier feature.

Seen in

Last updated · 470 distilled / 1,213 read