SYSTEM Cited by 10 sources
Redpanda Connect¶
Redpanda Connect is Redpanda's Kafka-Connect alternative: a hundreds-of-connectors integration layer that feeds data into and out of the Redpanda broker (or any Kafka-API-compatible broker) without running a separate Kafka Connect cluster. Originally developed as the open-source Benthos project (joined Redpanda in 2023), it is positioned as a "fresh alternative to Kafka Connect that's more flexible, scalable, and simpler to deploy".
Stub page — expand on future Redpanda Connect internals sources. Canonical wiki use case is the four CDC input connectors shipped as the flagship source class, introduced in the 2025-03-18 blog post.
CDC input connectors (2025-03)¶
Four per-database-engine change-data-capture inputs, each riding on the engine's native change log:
postgres_cdc— rides on PostgreSQL logical replication; a replication slot exports a consistent snapshot and marks the LSN boundary for transition to the streaming phase. Supports parallel snapshot of large tables — the Redpanda differentiator against Debezium.mysql_cdc— rides on MySQL binlog; takes a global read lock to capture initial snapshot + binlog position, then releases to stream. Requires an external offset store (Redis, SQL database, any datastore) for binlog-position durability. Topology scope limited at 2025-03: "standard MySQL setups and primary-replica configurations, with plans to extend support for high-availability clusters and Global Transaction ID (GTID) environments". No GTID support at time of publication — a significant gap vs Debezium's MySQL connector.mongodb_cdc— rides on MongoDB change streams / oplog; supports parallel snapshot by splitting collections into chunks, and offers "flexible document modes" for update/delete handling (full-document lookups, pre/post image capture). Also requires an external offset store for oplog positions.gcp_spanner_cdc— rides on Google Cloud Spanner change streams; stores progress "transactionally in a configurable spanner table for at least once delivery". Automatically processes partitions that are merged and split — the dynamic-partition-topology property is structurally distinct from the static-shape of the other three engines.
(Source: sources/2025-03-18-redpanda-3-powerful-connectors-for-real-time-change-data-capture)
Microsoft SQL Server CDC (Redpanda Connect 4.67.5, 2025-11-06)¶
Redpanda Connect 4.67.5 (shipped alongside Redpanda 25.3)
introduces a fifth per-engine CDC input in the Redpanda Connect
family:
microsoft_sql_server_cdc — riding on SQL Server's native
change tables
mechanism.
Verbatim from the sources/2025-11-06-redpanda-253-delivers-near-instant-disaster-recovery-and-more|25.3 launch post:
"Using dedicated change tables, the connector non-invasively captures every single insert, update, and delete from your SQL Server tables in real time and streams them into Redpanda with minimal impact on the source database's performance."
Completes the CDC family to five source-database engines (Postgres / MySQL / MongoDB / Spanner / SQL Server). Fits the same CDC driver ecosystem framing as the prior four.
Vendor benchmark disclosed in the launch post:
- Redpanda Connect MSSQL CDC: ~40 MB/s ingest, 3:15 initial snapshot on a 5M-row table.
- Unnamed "alternative hosted Kafka + CDC service": ~14.5 MB/s, 8:04 initial snapshot.
- Test hardware: 4 vCPUs (1 logical core), 16 GB memory Azure instance; SQL Server co-located.
Enterprise-gated. Not yet disclosed: Always On AG / mirroring / log shipping topology support, parallel-snapshot capability. See systems/redpanda-connect-mssql-cdc for the full system page.
Oracle CDC (Redpanda Connect 4.83.0, 2026-04-09)¶
Redpanda Connect 4.83.0 (shipped 2026-04-09) introduces the
sixth per-engine CDC input in the Redpanda Connect family:
oracledb_cdc — riding
on Oracle LogMiner, the Oracle
Enterprise Edition redo-log-mining utility.
Verbatim from the launch post:
"Starting with Redpanda Connect v4.83.0, the
oracledb_cdcinput captures changes directly from Oracle, including: inserts, updates, and deletes. The connector then routes them downstream as structured events. No JVM, no Kafka Connect cluster, no separate workers. Just Redpanda Connect doing what it does best."
Completes the CDC family to six source-database engines: Postgres / MySQL / MongoDB / Spanner / MSSQL / Oracle.
Three architectural properties canonicalised in the launch post:
- LogMiner as the change-capture substrate. Rides on Oracle Enterprise Edition's built-in LogMiner utility; no additional Oracle licensing required (contrast Oracle GoldenGate).
- In-source checkpointing. Progress is stored in a checkpoint table inside Oracle itself — no external Redis or SQL offset store required. Canonicalised as concepts/in-source-cdc-checkpointing, the fourth offset- durability class across the CDC family.
- Automatic schema tracking via
ALL_TAB_COLUMNS. Queries Oracle's data dictionary for precision-aware column metadata; emits integer columns (NUMBER(p, 0)) asint64and decimal columns (NUMBER(p, s)withs > 0) asjson.Number. Composes withschema_registry_encodefor typed Avro encoding. New columns detected mid-stream; dropped columns reflected after connector restart. Canonicalised as concepts/precision-aware-type-mapping.
Oracle Wallet supported via the
wallet_path config field for regulated environments —
canonicalised as the wiki's first instance of
concepts/file-based-credential-store. Two wallet formats:
cwallet.sso (auto-login, no password) and ewallet.p12
(PKCS#12, password via wallet_password config field, which is
redacted from logs and config dumps). SSL enabled automatically.
Multi-table routing via
Bloblang
interpolation on the output: topic: ${! meta("table_name").lowercase() }
routes each CDC event to its own per-table Kafka topic. Canonical
second-instance of the same pattern (first being the 2026-03-05
Iceberg-output sink-side instance).
Enterprise-gated. See systems/redpanda-connect-oracle-cdc for the full connector system page.
Canonical differentiator vs Debezium¶
The 2025-03-18 post's load-bearing competitive claim verbatim:
"Redpanda's PostgreSQL and MongoDB CDC connectors can also parallelise reads for large tables. That means tables or collections with millions of records can be split into smaller chunks and read in parallel. Debezium (Kafka Connect) does not do this today."
Other CDC tools (including Debezium) parallelise across tables or collections — one stream per table. Redpanda Connect's Postgres and MongoDB connectors parallelise within a single table or collection — chunks read concurrently during the snapshot phase. Canonicalised on the wiki as concepts/parallel-snapshot-cdc.
Chunk-splitting algorithm not disclosed in the post. Open questions: how are chunk boundaries picked (primary key range? internal page ranges?)? Is the snapshot consistency preserved across parallel readers (single transaction? independent transactions with reconciled boundary)?
Positioning vs Kafka Connect¶
Redpanda Connect is marketed as the in-Redpanda-product integration layer — no separate connector cluster, no separate operator, no separate offset topic. The Debezium + Kafka Connect CDC pipeline shape is the competitive reference: Debezium ships as source connectors running on a Kafka Connect cluster, with offsets stored in Kafka topics, feeding Kafka topics as output. Redpanda Connect collapses this by running connectors as first-class Redpanda components.
Per-engine connector family structure mirrors Debezium — canonical CDC driver ecosystem instance at the Redpanda-ecosystem altitude.
MCP tool surface via rpk connect mcp-server (2025-04)¶
As of the 2025-04-03 Redpanda Agents
SDK launch, Redpanda Connect doubles as the tool-surface layer
for enterprise AI agents: the rpk connect mcp-server subcommand
exposes any of the ~300 pre-built connectors as a
Model Context Protocol tool with
a simple configuration change (see
info.csv
for the full connector catalog). The canonical instantiation of
MCP as centralized
integration proxy.
Two programming surfaces are load-bearing in the MCP-server role:
- Bloblang — Redpanda Connect's declarative per-field mapping / filtering / transformation language. Content filtering at the MCP-tool call shape is authored in Bloblang.
- Starlark — a Python subset embedded as a code-extension language for cases where Bloblang isn't expressive enough. "Effectively Python without imports, but more importantly, it is all Python so no need to learn a new configuration language." Also usable as a replacement for authoring Redpanda Connect YAML (Python-native pipeline config).
The combined shape — Bloblang declarative + Starlark escape hatch — is the mechanism behind dynamic content filtering in the MCP pipeline, the fine-grain-ACL future Alex Gallego describes in the 2025-04-03 post:
"For Redpanda Connect specifically, it is the ability to leverage full programming languages via custom code extensions to give engineers the speed of iteration while letting the security team sleep at night, knowing they can enforce overriding global policies for ultra-fine-grain access to any of the ~300 connectors in a declarative fashion." (Source: sources/2025-04-03-redpanda-autonomy-is-the-future-of-infrastructure)
Dynamic plugins (2025-06, Beta, Apache 2.0)¶
Redpanda Connect v4.56.0 (2025-06-17) introduced dynamic plugins — a Beta, Apache-2.0-licensed plugin framework that escapes the previous "compiled plugins" model (Go-only, built into the binary). Dynamic plugins are external executables launched as subprocesses, communicating with the host Redpanda Connect process over gRPC on a Unix domain socket. The gRPC service "closely mirrors the existing interfaces defined for plugins within Redpanda Connect's core engine, Benthos" (Source: sources/2025-06-17-redpanda-introducing-multi-language-dynamic-plugins-for-redpanda-connect).
Three new built-in plugins — BatchInput, BatchProcessor,
BatchOutput — ship in the host binary as dispatch shims that
load and communicate with external plugin executables. Only
batch component types are exposed across the boundary:
batch-only
by design, to amortize the cross-process IPC cost. Go and
Python SDKs ship at launch; any gRPC-capable language could
write its own.
Two structural properties the post calls out verbatim:
- Crash containment. "Plugins run in separate processes, so crashes won't take down the main Redpanda Connect engine." Canonicalized as concepts/subprocess-plugin-isolation.
- Language agnosticism. "Write plugins in virtually any language that supports gRPC."
Explicit architectural guidance positions dynamic plugins as additive, not a replacement:
"For performance-critical workloads where every microsecond counts, the best approach remains using native Go plugins compiled directly into the Redpanda Connect binary. Dynamic plugins shine for flexibility and language choice, while compiled plugins offer maximum performance." (Source: sources/2025-06-17-redpanda-introducing-multi-language-dynamic-plugins-for-redpanda-connect)
Canonicalized as [[patterns/compiled-vs-dynamic-plugin- tradeoff]] and as an instance of [[patterns/grpc-over-unix- socket-language-agnostic-plugin]]. The headline language target is Python — explicitly bridging the streaming substrate to the PyTorch / TensorFlow / Hugging Face / LangChain ecosystem — for which the motivating use case is a Python processor plugin running a pre-trained BERT model for sentiment analysis on streaming customer feedback.
See systems/redpanda-connect-dynamic-plugins for the full architecture, developer surface, and known gaps.
Snowpipe Streaming output connector (2025-10 benchmark)¶
The
snowflake_streaming
output connector — based on Snowflake's
Snowpipe Streaming API
— is the Redpanda-side surface for landing streaming data into
Snowflake tables at low latency. Disclosed in the
[[sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming|2025-10-02
Redpanda+Snowflake benchmark]]: 3.8 billion 1 KB AVRO
messages at 14.5 GB/s into a single Snowflake table with
P50 ≈ 2.18 s / P99 ≈ 7.49 s end-to-end latency — exceeds
Snowflake's documented 10 GB/s per-table ceiling by 45%.
Three canonical tuning knobs disclosed in the benchmark:
channel_prefix×max_in_flight— together control the number of Snowpipe Streaming channels opened against the target table. Hard ceiling of 10,000 channels per table; exceeding it surfaces as "the Snowpipe API screaming at us."build_paralellism— thread count for batch serialisation / commit preparation. Tuned to(cores − small reserve); benchmark set it to 40 on 48-corem7gd.12xlargenodes. Canonical concepts/build-parallelism-for-ingest-serialization instance.- Count-based batching — preferred over
byte_sizebatching on the hot produce path; less trigger-evaluation CPU overhead. Canonical patterns/count-over-bytesize-batch-trigger instance.
The benchmark's decisive scaling dimension was intra-node
input/output parallelism via the
broker
primitive — running many parallel kafka_franz →
snowflake_streaming pipelines within a single Connect
process to fully saturate the 48-core nodes. Canonicalised
as patterns/intra-node-parallelism-via-input-output-scaling.
86% of the P99 end-to-end latency (~6.44 s of 7.49 s) lives in the Snowflake upload/register/commit path — the analytical-sink commit is the dominant latency contributor, not the broker read or transport hop. Public- internet transport in the benchmark; AWS PrivateLink would reduce it further.
Iceberg output connector (v4.80.0, 2026-03-05, Enterprise)¶
Redpanda Connect v4.80.0 (shipped 2026-03-05, enterprise-
gated) introduces the
iceberg output — a
declarative sink that writes streaming data to
Apache Iceberg tables via the
Iceberg REST Catalog API. Positioned as the non-Kafka-
source companion to the broker-native
Redpanda Iceberg Topics
feature: zero-ETL Kafka → Iceberg is Iceberg Topics; sink-
connector with 300+ upstream sources + in-stream transforms +
multi-table routing is the Iceberg output. Canonicalised on the
wiki as
patterns/sink-connector-as-complement-to-broker-native-integration.
Three architectural properties are load-bearing (Source: sources/2026-03-05-redpanda-introducing-iceberg-output-for-redpanda-connect):
- Registry-less, data-driven schema evolution — infers table schema from raw JSON; no Schema Registry required. Framed verbatim as the "best of both worlds" between chained SMT brittleness ("maintenance toil") and all-string dirty-data tables.
- Data-driven flushing — flush only when data is present; inverts the Kafka-Connect- era timer-driven default. Mitigates the small-file problem on object storage and quiet-source compute waste.
- Bloblang-
interpolated multi-table routing —
tableandnamespaceconfig fields support Bloblang interpolation ('events_${!this.event_type}'). One pipeline definition, N destination tables.
The connector speaks any REST-compliant Iceberg catalog — Apache Polaris, AWS Glue, Unity Catalog, GCP BigLake, Snowflake Open Catalog. OAuth2 client-credentials is the idiomatic auth; per-tenant REST catalog isolation is explicitly supported.
Scope limitations (v4.80.0): append-only only at launch (upserts on roadmap — material for CDC UPDATE/DELETE workloads); schema-inference mechanism depth undisclosed; enterprise-gated license (contrast: 2025-06-17 dynamic-plugins launch was Apache 2.0); no published benchmarks. Two-shape comparison against Iceberg Topics ("Zero-ETL convenience vs Integration flexibility") canonicalised in the launch post's feature matrix.
See systems/redpanda-connect-iceberg-output for the full system page.
ClickHouse output via sql_raw / sql_insert (2025-12)¶
There is no dedicated ClickHouse output connector in
Redpanda Connect as of the 2025-12-09 disclosure. The
canonical workaround is to use the generic SQL processors —
sql_raw
and
sql_insert
— against ClickHouse's SQL interface. These components are
available as input, processor, and output types, giving
three wiring options. Verbatim from the Redpanda blog:
"While there isn't a dedicated ClickHouse connector for Redpanda Connect yet, the
sql_rawandsql_insertcomponents allow you to stream execute commands or stream data from Redpanda into ClickHouse. They're available as input, processor, and output types, so you've got flexibility in how you wire things up." (Source: sources/2025-12-09-redpanda-streaming-iot-and-event-data-into-snowflake-and-clickhouse)
The same post names dual-write to both Snowflake and
ClickHouse simultaneously as a canonical composition —
pair sql_raw / sql_insert (ClickHouse leg) with the
first-class snowflake_streaming (Snowflake leg), either
using a
broker
output to fan out every record to both sinks, or
multiplexing
to route per-record by rule. Canonicalised as the pattern
patterns/clickhouse-plus-snowflake-dual-storage-tier.
Gap vs Snowpipe Streaming: the ClickHouse leg is a
generic-SQL-sink hop, not a columnar-native bulk-insert
path. No per-channel exactly-once offset tokens, no
schema-evolution helper, no built-in parallelism knob
matching channel_prefix × max_in_flight. Downstream
exactly-once on ClickHouse requires idempotent inserts
(e.g. ReplacingMergeTree or UUID primary keys).
Output multiplexing and broker fan-out¶
The Redpanda Connect output layer supports two parallel- output shapes that compose across connectors of any kind:
broker— fan out the same message to every configured output. Each output gets an independent copy; a failure on one doesn't block the others. Canonical shape for "write this event to Snowflake and ClickHouse"-style dual-tier pipelines.- Multiplexing (output
switchwith per-record rules) — route each message to exactly one output based on Bloblang predicates. Canonical shape for "route logs to cold storage, errors to alerting, events to real-time"-style rule-based pipelines.
The 2025-12-09 post frames these as the two composition
primitives for the ClickHouse + Snowflake dual-tier
pattern; both work with the heterogeneous mix of
snowflake_streaming (first-class) + sql_insert
(generic).
GitOps deployment (2025-12-02 tutorial)¶
The 2025-12-02 Redpanda unsigned tutorial sources/2025-12-02-redpanda-operationalize-redpanda-connect-with-gitops canonicalises the end-to-end Argo CD + Helm + Kustomize deployment shape for Redpanda Connect on Kubernetes. Tutorial walks through both deployment modes side by side:
-
Standalone mode — single pipeline with config baked into Helm
values.yaml. Deployed via Argo CD multi-source Application — chart fromcharts.redpanda.compinned attargetRevision: 3.1.0, values from the customer's own repo referenced via$values/standalone/standalone-mode.yaml. Scaling isdeployment.replicaCountvia Git commit. -
Streams mode — multiple pipelines loaded from Kubernetes ConfigMaps at runtime. Deployed via Kustomize wrapping the Helm chart —
configMapGeneratorturns each pipeline YAML into a hash-suffixed ConfigMap;helmChartsinflates the chart inline; precondition iskustomize.buildOptions: --enable-helm --load-restrictor LoadRestrictionsNoneon Argo CD. Config changes trigger rolling restart via ConfigMap hash rollout.
The Streams-mode REST API (/version, /ready, /streams,
/metrics) introduces the
runtime-API vs
GitOps source-of-truth tension — GitOps-compatible when
automation derives state from Git, anti-pattern when humans mutate
pipelines through the API without Git commit.
Observability deployed as parallel Argo CD Application —
kube-prometheus-stack (Prometheus + Alertmanager + Grafana +
K8s dashboards) + Prometheus service monitor + Redpanda Connect
Grafana dashboard.
Redpanda Connect exposes Prometheus-compatible metrics natively
"without custom exporters or sidecars". Companion GitHub repo:
redpanda-data-blog/redpanda-connect-the-gitops-way.
Licensing¶
At 2025-03 the four CDC input connectors are gated behind an Enterprise license in Redpanda Cloud and Self-Managed deployments. Not a free-tier feature.
Seen in¶
-
sources/2026-04-09-redpanda-oracle-cdc-now-available-in-redpanda-connect — sixth-engine Oracle CDC addition (Redpanda Connect 4.83.0, 2026-04-09). Completes the Redpanda Connect CDC family to six source-database engines (Postgres / MySQL / MongoDB / Spanner / MSSQL / Oracle). Three architectural canonicalisations: LogMiner as CDC substrate; in- source checkpointing as fourth offset-durability class ("no external cache required, no re-snapshot, and no gaps"); precision-aware
NUMBERmapping viaALL_TAB_COLUMNSdata-dictionary metadata feedingschema_registry_encodeinto typed Avro. Plus canonical first wiki instance of Oracle Wallet as the regulated-environment auth substrate. Second canonical instance of Bloblang- interpolated multi-table routing at the CDC-source-to-topic- per-table position. See systems/redpanda-connect-oracle-cdc for the full connector page. -
sources/2025-12-09-redpanda-streaming-iot-and-event-data-into-snowflake-and-clickhouse — canonical wiki disclosure that Redpanda Connect has no dedicated ClickHouse output connector (as of 2025-12-09). Generic
sql_raw/sql_insertprocessors are the canonical workaround, composed with the first- classsnowflake_streamingconnector via dual-write to both tiers — usingbrokerfan-out or multiplexing to orchestrate the heterogeneous output mix. Also discloses specific Snowpipe Streaming batch tuning recommendations (500–1,000 records for time- series;byte_size: 0;period10–30 s for real-time). - sources/2025-12-02-redpanda-operationalize-redpanda-connect-with-gitops
— canonical Redpanda Connect GitOps deployment tutorial
(2025-12-02). Walks through deploying Standalone + Streams modes
via Argo CD with Helm chart
3.1.0 from
charts.redpanda.comand Kustomize for ConfigMap generation + hash-driven rolling restart. Canonicalises Standalone vs Streams mode as the deployment-shape decision, Argo CD multi-source Application for Standalone, and Kustomize-wraps-Helm for Streams. Streams REST API introduces the runtime-API vs GitOps source-of-truth anti-pattern discipline. - sources/2025-11-06-redpanda-253-delivers-near-instant-disaster-recovery-and-more — MSSQL CDC addition (Redpanda Connect 4.67.5, 2025-11-06). Extends the Redpanda Connect CDC family to five source- database engines (Postgres / MySQL / MongoDB / Spanner / SQL Server). Vendor benchmark: ~40 MB/s ingest + 3:15 initial snapshot on 5M-row table vs ~14.5 MB/s / 8:04 for an unnamed alternative. See systems/redpanda-connect-mssql-cdc for the connector system page.
- sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming
— canonical wiki disclosure of the
snowflake_streamingoutput connector and the 14.5 GB/s Redpanda → Snowflake benchmark exceeding Snowflake's documented 10 GB/s single-table ceiling by 45%. Four canonical tuning insights from the run: AVRO over JSON (~20% uplift, patterns/binary-format-for-broker-throughput); count-based over byte-size batch triggers (patterns/count-over-bytesize-batch-trigger);build_paralellismtuned to (cores − small reserve); channel-count scaling viachannel_prefix×max_in_flight(concepts/snowpipe-streaming-channel). Thebrokerinput/output primitive named as the decisive throughput-scaling dimension — intra-node input/output parallelism matters more than node count, canonicalised as patterns/intra-node-parallelism-via-input-output-scaling. - sources/2025-06-17-redpanda-introducing-multi-language-dynamic-plugins-for-redpanda-connect — 2025-06-17 launch of dynamic plugins in v4.56.0 (Beta, Apache 2.0): subprocess + gRPC-over-Unix-socket + batch-only component types + Go & Python SDKs. Frames compiled plugins as the performance path, dynamic plugins as the flexibility / language- choice path. Canonical wiki instance of patterns/grpc-over-unix-socket-language-agnostic-plugin + patterns/compiled-vs-dynamic-plugin-tradeoff.
- sources/2025-04-03-redpanda-autonomy-is-the-future-of-infrastructure
— 2025-04-03 founder-voice launch positions Redpanda Connect as
the tool-surface layer of the Redpanda Agents SDK:
rpk connect mcp-serverexposes connectors as MCP tools; Bloblang + Starlark enable dynamic content filtering per tool call. Canonical "Ruby-on-Rails for agents" glue. - sources/2025-03-18-redpanda-3-powerful-connectors-for-real-time-change-data-capture — canonical wiki introduction of Redpanda Connect as Kafka-Connect alternative + the four per-engine CDC input connectors + parallel-snapshot-of-large-table as the differentiator vs Debezium.
Related¶
- systems/redpanda
- systems/redpanda-byoc
- systems/redpanda-agents-sdk
- systems/redpanda-connect-dynamic-plugins
- systems/model-context-protocol
- systems/grpc
- systems/kafka-connect
- systems/debezium
- systems/postgresql
- systems/mysql
- systems/mongodb-server
- systems/cloud-spanner
- systems/argocd
- systems/helm
- systems/kustomize
- systems/kubernetes
- systems/prometheus
- systems/grafana
- concepts/change-data-capture
- concepts/parallel-snapshot-cdc
- concepts/external-offset-store
- concepts/autonomy-enterprise-agents
- concepts/subprocess-plugin-isolation
- concepts/batch-only-component-for-ipc-amortization
- concepts/gitops
- concepts/standalone-vs-streams-mode
- concepts/configmap-hash-rollout
- concepts/runtime-api-vs-gitops-source-of-truth
- patterns/debezium-kafka-connect-cdc-pipeline
- patterns/cdc-driver-ecosystem
- patterns/mcp-as-centralized-integration-proxy
- patterns/dynamic-content-filtering-in-mcp-pipeline
- patterns/wrap-cli-as-mcp-server
- patterns/grpc-over-unix-socket-language-agnostic-plugin
- patterns/compiled-vs-dynamic-plugin-tradeoff
- patterns/argocd-multi-source-helm-plus-values
- patterns/kustomize-wraps-helm-chart
- companies/redpanda