Skip to content

PATTERN Cited by 1 source

Streaming CDC to relational buffer

Streaming CDC to relational buffer is the pattern of landing a change-data-capture stream from a source-of-truth system directly into a PostgreSQL (or other relational) database using JSONB as the ingest buffer, fronted by a connection-pool + failover proxy to absorb burst traffic, and then selectively normalizing the fields the downstream query workload actually touches.

The pattern fills a gap between two stereotypical CDC architectures:

  • Stream → data lake (Kafka → Iceberg / Hudi) is the modern default but adds a full stream-processor + lakehouse stack. Appropriate when downstream consumers are predominantly batch-analytical.
  • Stream → dashboard straight from event store works for narrow time-series dashboards but struggles with ad-hoc analytics that need relational joins / indexed lookups.

Stream → relational buffer lands the events somewhere a BI tool can query with regular SQL — JSONB blobs for flexible ingest, materialized / indexed columns for the fields dashboards actually touch. A good middle ground when:

  • The downstream consumers are BI dashboards needing real-time relational queries.
  • The team doesn't want to stand up a full lakehouse for a single use case.
  • The query workload is known and bounded (selective indexing covers it).

Shape

Source system
    │ CDC stream (e.g. NDJSON over HTTPS)
Ingress tier (optional: NLB + NAT router for cross-VPC ingress)
Connection-pool + failover proxy  (e.g. RDS Proxy with IAM auth)
Relational DB (Aurora PostgreSQL, Multi-AZ)
    │  INSERT INTO events (op, table, ts, data)
    │  VALUES ('update', 'sales_orders', now(), $jsonb);
    │  Expression indexes:
    │    CREATE INDEX ON events ((data->>'customer_id'));
    │    CREATE INDEX ON events ((data->>'region'));
    │  Optional materialized views for hot aggregates.
BI tier (QuickSight, SPICE cache over aggregates)

Why JSONB as the buffer

See concepts/jsonb-streaming-buffer for the detail. The short version:

  • Schema tolerance: arriving CDC events evolve with upstream schema changes without blocking ingest.
  • Selective indexing: only the fields dashboards query get indexed — no upfront commitment to a complete normalized schema.
  • Relational queryability: standard SQL + JSONB path expressions from the BI layer, no special streaming-query engine required.

Why a connection pool / failover proxy

Streaming CDC is a high-frequency continuous write workload. Direct connections from the ingress tier can:

  • Exhaust Aurora's native max_connections under burst.
  • Fail open on writer failover, dropping events.

A proxy like RDS Proxy:

  • Pools a large number of short-lived client connections onto a smaller pool of long-lived backend connections.
  • Transparently redirects traffic to the promoted replica on writer failure — CDC ingest keeps flowing.
  • Supports IAM auth so credentials aren't in the NAT tier's environment.

Canonical production instance

sources/2026-04-21-aws-oldcastle-infor-aurora-quicksight-real-time-analytics — Oldcastle APG lands Infor Cloud ERP CDC events (NDJSON over HTTPS) into Aurora PostgreSQL JSONB, fronted by RDS Proxy, behind the patterns/nat-router-for-static-ip-ingress ingress shape. QuickSight dashboards query via relational SQL with SPICE caching for hot aggregates. 50+ dashboards; 100+ concurrent users; millions of transactions/day processed in real time.

Trade-offs

  • Not a replacement for a lakehouse. Analytical workloads that scan petabytes belong on Iceberg/Hudi/Delta with a columnar engine — not JSONB on Postgres.
  • Schema drift is a monitoring burden. JSONB absorbs whatever upstream emits; auditing for silent schema changes is the application's responsibility (see concepts/schema-evolution).
  • Retention policy matters. JSONB grows unboundedly without TTL; a CDC stream landed into a relational buffer needs an archival / deletion policy or the table becomes a cost liability.
  • Delivery semantics inherit from the source. If the source CDC stream is at-least-once, the relational buffer must be idempotent or de-duplicate.
Last updated · 476 distilled / 1,218 read