Skip to content

Redpanda — Oracle CDC now available in Redpanda Connect

Summary

Redpanda (2026-04-09) announces the oracledb_cdc input connector in Redpanda Connect v4.83.0 (enterprise-gated), adding Oracle as the sixth source-database engine in Redpanda's per-engine CDC family (after Postgres / MySQL / MongoDB / Spanner / MSSQL canonicalised in prior wiki ingests). Short (~900-word) launch-voice tutorial post. Four load-bearing architectural disclosures: (1) rides on Oracle LogMiner — the Oracle Enterprise Edition redo-log-mining substrate — as the native change-log mechanism, a new per-engine CDC mechanism on the wiki; (2) in-source checkpointing"Restarts resume from a checkpointed position stored in Oracle itself, with no external cache required" — a fourth canonical offset-durability class alongside server-owned (Postgres slot), consumer-owned external (MySQL/MongoDB), and transactional-row (Spanner); (3) automatic schema tracking via ALL_TAB_COLUMNS — the connector queries Oracle's data-dictionary view and attaches a precision-aware column schema to each message as metadata (integers as int64, decimals as json.Number), from which schema_registry_encode registers and serialises as Avro; new columns detected mid-stream, dropped columns on restart; (4) Oracle Wallet authcwallet.sso (auto-login) + ewallet.p12 (PKCS#12 with password) as the enterprise-grade file-based credential-store substitute for plain-text connection-string passwords, with SSL enabled automatically. Canonical competitive framing: single Go binary vs JVM + Kafka Connect cluster + Debezium"we wanted to make that significantly simpler". Tier-3 borderline include on vocabulary-canonicalisation grounds: fills gaps the prior five- engine CDC ingests left open (Oracle's redo-log mechanism, in- source checkpointing, precision-aware type mapping, file-based credential store).

Key takeaways

  1. Oracle is the sixth source-database engine in Redpanda Connect's CDC family. Shipped in Redpanda Connect v4.83.0 as the oracledb_cdc input; enterprise-gated. The family now spans Postgres / MySQL / MongoDB / Spanner / MSSQL / Oracle. Verbatim: "Starting with Redpanda Connect v4.83.0, the oracledb_cdc input captures changes directly from Oracle, including: inserts, updates, and deletes. The connector then routes them downstream as structured events. No JVM, no Kafka Connect cluster, no separate workers. Just Redpanda Connect doing what it does best." (Source: sources/2026-04-09-redpanda-oracle-cdc-now-available-in-redpanda-connect)

  2. The connector rides on Oracle LogMiner, Oracle Enterprise Edition's built-in redo-log- mining utility that reconstructs row-level change events from the redo stream. Verbatim: "because it uses Oracle LogMiner, which ships with Oracle Enterprise Edition, there's no additional Oracle licensing required." Canonical new wiki mechanism — sibling to Postgres logical replication, MySQL binlog, MongoDB change streams, Spanner change streams. See concepts/oracle-logminer-cdc.

  3. Snapshot → stream transition, with in-source checkpointing. Verbatim: "On first run, the connector takes a consistent snapshot of your tables and emits each existing row as a read event. Once the snapshot finishes, it transitions to streaming mode, tailing Oracle's redo logs from that point forward and picking up insert, update, and delete events as they land. Restarts resume from a checkpointed position stored in Oracle itself, with no external cache required, no re-snapshot, and no gaps." Canonicalised as in-source CDC checkpointingthe fourth canonical offset-durability class across the per-engine CDC family:

Engine Offset ownership Storage location
PostgreSQL Server Replication slot (confirmed_flush_lsn)
MySQL Consumer External store (Redis / SQL)
MongoDB Consumer External store (resume token)
Spanner Consumer Transactional row in source DB
Oracle Consumer Checkpoint table in source DB
MSSQL (undisclosed at 2025-11)

The Oracle and Spanner shapes are both source-DB-resident but differ: Spanner stores progress transactionally with each data row, Oracle stores a separate checkpoint table the connector advances. Oracle's shape retires the external offset store requirement that MySQL + MongoDB carry.

  1. Automatic schema tracking via ALL_TAB_COLUMNS with precision-aware NUMBER mapping. Verbatim: "The oracledb_cdc connector handles it automatically. It queries Oracle's ALL_TAB_COLUMNS catalog and attaches a full column schema to each message as metadata, with precision-aware NUMBER mapping (integers as int64, decimals as json.Number). The schema_registry_encode processor reads that schema directly, registers it, and encodes the payload as Avro. Your consumers get typed, schema-tracked events from day one." Canonicalised as concepts/precision-aware-type-mapping. Schema-drift behaviour:
  2. New columns added mid-stream → detected automatically.
  3. Dropped columns → reflected after a connector restart.

Canonical verbatim anti-pattern framing: "Schema drift is the thing that silently corrupts your downstream data until someone notices a null where they expected a number (usually in production, usually days after the column was added, usually not by you). Most CDC setups leave this problem to you."

  1. Oracle Wallet auth for regulated environments. Canonical new file-based credential store on the wiki. Verbatim: "Oracle Wallet is the standard answer: a file-based credential store provisioned by the DBA that the client uses instead of a username and password. Point wallet_path at the directory your DBA provisioned, and the connector handles the rest. SSL is enabled automatically." Two wallet formats:
  2. cwallet.ssoauto-login wallet, no password required.
  3. ewallet.p12PKCS#12 wallet, password required via wallet_password config field, "treated as a secret field and will be redacted from logs and config dumps". Fits the file-based credential store concept — canonical compliance substrate for regulated workloads (audit trail for secrets, no plain-text passwords in config).

  4. Multi-table routing via Bloblang interpolation. Verbatim: "The table_name metadata field flows through the pipeline, and Bloblang interpolation routes each event to its own topic automatically. One pipeline config, any number of tables." Canonical CDC-topic-per-table instance of patterns/bloblang-interpolated-multi-table-routing — the same Bloblang-interpolation shape canonicalised on the Iceberg- output ingest, now in CDC-topic-routing role:

    output:
      redpanda:
        topic: ${! meta("table_name").lowercase() }
    
    The include config field takes regex patterns (SALES\.ORDERS matches exactly, SALES\..* matches every table in the schema).

  5. Competitive framing: "One binary. No JVM. No Connect cluster." Canonical launch-post positioning against Debezium on Kafka Connect. Verbatim: "Debezium is a solid project. If your team is already running Kafka Connect for other connectors, adding Oracle CDC on top is a reasonable lift. But if you're not already running Kafka Connect, you're standing up a significant amount of infrastructure — dedicated workers, connector offsets, a JVM heap to size, its own monitoring surface — for what should be a data pipeline. Redpanda Connect is a single Go binary." Fits the wiki's CDC driver ecosystem framing: two separate consumer-side ecosystems (Kafka-Connect-hosted Debezium vs single-binary Redpanda Connect) each writing per-engine drivers against the source DB's native change log.

Canonical verbatim claims

  • LogMiner as substrate: "because it uses Oracle LogMiner, which ships with Oracle Enterprise Edition, there's no additional Oracle licensing required."
  • In-source checkpointing: "Restarts resume from a checkpointed position stored in Oracle itself, with no external cache required, no re-snapshot, and no gaps."
  • Precision-aware NUMBER mapping: "precision-aware NUMBER mapping (integers as int64, decimals as json.Number)."
  • Auto schema tracking: "New columns added to a captured table are detected automatically mid-stream. Dropped columns are reflected after a connector restart. Schema drift is handled, not ignored."
  • Wallet secret redaction: "It's treated as a secret field and will be redacted from logs and config dumps. The kind of thing that makes security reviewers happy, auditors quiet, and nobody paging you at midnight about a credential in a log file."
  • Single-binary framing: "Redpanda Connect is a single Go binary. The oracledb_cdc input, schema registration, routing, and output all run in a single process with a single config file."

Worked configuration examples

The post walks two end-to-end pipeline examples:

Example 1 — Two-table SALES schema → Avro + Redpanda, per- table topic routing:

input:
  oracledb_cdc:
    connection_string: oracle://cdc_user:your_password@oracle-db.internal:1521/ORCL
    stream_snapshot: true
    include:
      - SALES\.ORDERS
      - SALES\.ORDER_ITEMS
pipeline:
  processors:
    - schema_registry_encode:
        url: http://schema-registry:8081
        subject: ${! meta("table_name") }
output:
  redpanda:
    seed_brokers:
      - redpanda-broker:9092
    topic: ${! meta("table_name").lowercase() }
    compression: lz4

Example 2 — Oracle Wallet auth (regulated environments):

input:
  oracledb_cdc:
    connection_string: oracle://host:1521/ORCL
    wallet_path: /opt/oracle/wallet
    wallet_password: "${WALLET_PASSWORD}"  # only needed for ewallet.p12 wallets
    include:
      - SALES\..*

Systems, concepts, and patterns canonicalised

New canonical wiki pages:

Wiki pages extended:

Operational numbers disclosed

None. Launch-voice post with no throughput / latency / fleet / snapshot-duration figures. Vendor competitive framing against Debezium is qualitative only (no benchmark numbers, unlike the 2025-11-06 MSSQL CDC launch which disclosed ~40 MB/s vs ~14.5 MB/s).

Caveats

  • Launch-voice: product announcement register with "significantly simpler" framing. No production retrospective, no customer case study, no measured operational numbers (throughput, latency, snapshot duration, redo-log-rate sustainable, checkpoint-advance-cadence).
  • No benchmark against Debezium: competitive framing is qualitative only (single-binary vs JVM + Kafka Connect). The 2025-11-06 MSSQL CDC launch canonicalised a specific benchmark (~40 MB/s vs ~14.5 MB/s on 5M-row table); this post does not.
  • LogMiner mechanism depth undisclosed: redo log mining is known to have operational caveats (LogMiner supplemental logging requirements, DBMS_LOGMNR package APIs, continuous- mining vs ad-hoc mode, archive-log-generation rate, LogMiner performance overhead on the source primary); the post doesn't walk any of these.
  • Snapshot consistency model undisclosed: is the initial snapshot a single FLASHBACK AS OF SCN snapshot, or per-table cursors? How does the snapshot-to-stream transition pick its SCN boundary?
  • Checkpoint-table mechanism undisclosed: name, schema, write cadence, permissions model all elided. "Checkpointed position stored in Oracle itself" is the full disclosure.
  • Schema-evolution mechanism depth undisclosed: "new columns detected automatically mid-stream" verbatim, but the detection cadence (per-record? per-batch? per-minute?) and performance cost of per-record ALL_TAB_COLUMNS lookup both elided. Column rename / type change semantics not addressed.
  • No Oracle topology scope enumeration: Oracle RAC, Oracle Data Guard, Oracle Active Data Guard, Multitenant / Pluggable Databases, Oracle Standard Edition vs Enterprise Edition not enumerated. LogMiner is an Enterprise Edition feature; Standard Edition users cannot use this connector.
  • No parallel-snapshot-of-large-table claim: the 2025-03-18 Postgres + MongoDB CDC launch named intra-table parallel snapshot as Redpanda's differentiator vs Debezium. The Oracle post does not claim this capability — absence is not evidence of lack, but the differentiator claim is not made.
  • No UPDATE/DELETE semantic depth: emission claimed ("includes: inserts, updates, and deletes") but before/after image availability, primary-key change handling, cross-table UPDATE semantics all elided.
  • Undisclosed: JSON_TABLE-analogue for JSON columns (Oracle 12c+ JSON support undisclosed), LONG/LONG RAW deprecation treatment, CLOB/BLOB LOB handling, XMLTYPE handling.
  • Enterprise-gated: oracledb_cdc requires a Redpanda Enterprise Edition license (consistent with the rest of the Redpanda Connect CDC family; contrast the Apache-2.0 dynamic- plugin framework from 2025-06-17).
  • Unsigned: Redpanda default attribution.

Cross-source continuity

  • Sixth engine in the Redpanda Connect CDC family. Extends the prior five-engine ingestion ([Postgres / MySQL / MongoDB / Spanner from [[sources/2025-03-18-redpanda-3-powerful-connectors-for-real-time-change-data-capture|2025-03-18 CDC connectors post]]] + [MSSQL from [[sources/2025-11-06-redpanda-253-delivers-near-instant-disaster-recovery-and-more|2025-11-06 25.3 launch post]]]) to six. Canonical pattern CDC driver ecosystem now spans six consumer-side drivers in Redpanda Connect alone.
  • Companion to [[sources/2026-03-05-redpanda-introducing-iceberg-output-for-redpanda-connect|2026-03-05 Iceberg output launch]] — both introduce new connector shapes in Redpanda Connect (v4.80.0 Iceberg output / v4.83.0 Oracle CDC input) and both canonicalise enterprise-grade auth/compliance substrates (OAuth2 token exchange for REST catalogs / Oracle Wallet for source auth). Together they canonicalise Redpanda Connect's connector-catalog growth at both ends of a CDC pipeline (Oracle source + Iceberg sink).
  • Companion to [[sources/2025-06-24-redpanda-why-streaming-is-the-backbone-for-ai-native-data-platforms|2025-06-24 streaming-as-backbone essay]] — canonicalised the CDC fan-out single-stream-to-many-consumers pattern; Oracle CDC is the sixth source-database engine that can feed the fan-out shape.
  • Complements [[sources/2025-05-20-redpanda-implementing-fips-compliance-in-redpanda|2025-05-20 FIPS compliance post]] — that post canonicalised Redpanda's broker-level FIPS-validated cryptographic module substrate (OpenSSL 3.0.9); this post canonicalises Oracle Wallet as the CDC-client-side file-based credential store for regulated environments. Both address the compliance substrate at different layers of the pipeline.

Source

Last updated · 470 distilled / 1,213 read