Skip to content

PATTERN Cited by 2 sources

CDC fan-out from a single stream to many consumers

Problem

A service's database has many downstream consumers of its change stream: a full-text search index, an analytical warehouse, a feature store, a vector index, and a reactive-agent trigger. The naïve shape is for each consumer to read the source database directly — either via direct query, or by each running its own CDC reader against the WAL / binlog / oplog.

This naïve shape has three structural problems:

  1. Capacity-planning explosion on the source DB. Each additional consumer adds load to the primary — or requires a dedicated read replica with its own replication lag and cost. "Complex capacity planning" verbatim from Redpanda's framing.
  2. WAL cleanup pressure. Each independent CDC reader holds a replication slot (Postgres) or binlog-retention pin (MySQL) open. N consumers = N slots, each able to stall WAL truncation independently. Verbatim: "CDC streams can strain databases (e.g., by delaying WAL cleanup)."
  3. Reactivity requires application-layer redesign. Triggering an agent when a business event occurs (e.g. user downgrades their plan) requires the application to publish a separate event — a redesign of the write path — unless the CDC stream is already available as a shared substrate.

Pattern

Run exactly one CDC reader against the source database. Publish its output to a streaming topic. Every downstream consumer — search index, analytics DB, reactive agent, vector indexer — subscribes to that topic.

┌─────────────┐   single       ┌──────────┐   ┌──────────────┐
│ Source DB   │  ──────────►  │ CDC      │──►│ Stream topic │
│ (WAL/binlog/│   CDC reader   │ connector│   │ (Redpanda /  │
│ oplog)      │                │          │   │ Kafka)       │
└─────────────┘                └──────────┘   └──────────────┘
                                                     │ ├─► full-text search
                                                     │ ├─► analytics warehouse
                                                     │ ├─► vector / embedding index
                                                     │ ├─► feature store
                                                     └─► reactive agent trigger

Verbatim from Redpanda's framing:

"Using change data capture (CDC) to stream database changes into your streaming engine. This enables reactive consumers and keeps auxiliary systems (like full-text search or analytics databases) in sync without complicating your application logic." "While CDC streams can strain databases (e.g., by delaying WAL cleanup) a single stream feeding a fan-out system simplifies architecture and improves reliability. It avoids complex capacity planning and makes it easy to add features or reactivity to your application layer." (Source: sources/2025-06-24-redpanda-why-streaming-is-the-backbone-for-ai-native-data-platforms)

Worked example: reactive agent on plan downgrade

Verbatim: "triggering an agent when a user downgrades their plan can be done via the CDC stream on the user_plans table, without redesigning the application layer to support such reactivity."

  • Before: application code that calls UPDATE user_plans SET tier='free' WHERE user_id=... has to separately publish("plan. downgraded", {...}) on an event bus. Every new reactive behaviour requires a write-path code change.
  • After: the application just updates the table. CDC emits the user_plans change to a topic. The agent-trigger consumer subscribes to that topic and filters on old.tier != new.tier. New reactive behaviours are consumer-side additions, not write-path modifications.

Trade-offs

Wins - Source DB sees one CDC reader regardless of consumer count. - WAL retention / replication-slot footprint is a single-consumer problem, not an N-consumer problem. - Consumer fleet is elastic: add or remove search / analytics / agent-trigger consumers without touching the source. - Replay semantics are unified — any consumer can rewind to offset 0 and rebuild from the stream instead of re-snapshotting the source. Composes with concepts/stream-replayability-for-iterative-pipelines.

Costs - Adds a streaming broker as infrastructure. For organisations without existing streaming capability, the broker + CDC connector + schema registry stack is a new operational surface. - The single CDC reader becomes the SPOF for downstream synchronisation — its failure pauses every consumer. HA of the CDC reader matters (see concepts/ha-cdc-coupling). - Schema evolution in the source DB now has to be negotiated with all consumers through the broker / registry — centralises coordination that was previously distributed per-consumer.

Seen in

  • sources/2026-02-10-redpanda-how-to-safely-deploy-agentic-ai-in-the-enterprise — Akidau talk-recap frames CDC fan-out as the substrate for agent context-building and maintenance (axis 1 of his eight-axis checklist): "Building and maintaining data for your agent is a classic streaming Extract, Transform, Load (ETL) use case. You want to create datasets that are useful for your agents, whether you're building a knowledge base that connects to a vector database like Pinecone or performing change data capture (CDC) and pulling that data into an Online Analytical Processing (OLAP) database for analytical queries." Canonicalises the agent audience for the fan-out pattern — RAG / vector DB / OLAP analytics all as fan-out consumers of the same upstream CDC stream.
  • sources/2025-06-24-redpanda-why-streaming-is-the-backbone-for-ai-native-data-platforms — canonical wiki instance with the user_plans downgrade-trigger worked example + explicit WAL-cleanup trade-off naming + "single stream feeding a fan-out system simplifies architecture and improves reliability" framing.
Last updated · 470 distilled / 1,213 read