PATTERN Cited by 1 source
Streaming CDC to relational buffer¶
Streaming CDC to relational buffer is the pattern of landing a change-data-capture stream from a source-of-truth system directly into a PostgreSQL (or other relational) database using JSONB as the ingest buffer, fronted by a connection-pool + failover proxy to absorb burst traffic, and then selectively normalizing the fields the downstream query workload actually touches.
The pattern fills a gap between two stereotypical CDC architectures:
- Stream → data lake (Kafka → Iceberg / Hudi) is the modern default but adds a full stream-processor + lakehouse stack. Appropriate when downstream consumers are predominantly batch-analytical.
- Stream → dashboard straight from event store works for narrow time-series dashboards but struggles with ad-hoc analytics that need relational joins / indexed lookups.
Stream → relational buffer lands the events somewhere a BI tool can query with regular SQL — JSONB blobs for flexible ingest, materialized / indexed columns for the fields dashboards actually touch. A good middle ground when:
- The downstream consumers are BI dashboards needing real-time relational queries.
- The team doesn't want to stand up a full lakehouse for a single use case.
- The query workload is known and bounded (selective indexing covers it).
Shape¶
Source system
│ CDC stream (e.g. NDJSON over HTTPS)
▼
Ingress tier (optional: NLB + NAT router for cross-VPC ingress)
▼
Connection-pool + failover proxy (e.g. RDS Proxy with IAM auth)
▼
Relational DB (Aurora PostgreSQL, Multi-AZ)
│
│ INSERT INTO events (op, table, ts, data)
│ VALUES ('update', 'sales_orders', now(), $jsonb);
│
│ Expression indexes:
│ CREATE INDEX ON events ((data->>'customer_id'));
│ CREATE INDEX ON events ((data->>'region'));
│
│ Optional materialized views for hot aggregates.
▼
BI tier (QuickSight, SPICE cache over aggregates)
Why JSONB as the buffer¶
See concepts/jsonb-streaming-buffer for the detail. The short version:
- Schema tolerance: arriving CDC events evolve with upstream schema changes without blocking ingest.
- Selective indexing: only the fields dashboards query get indexed — no upfront commitment to a complete normalized schema.
- Relational queryability: standard SQL + JSONB path expressions from the BI layer, no special streaming-query engine required.
Why a connection pool / failover proxy¶
Streaming CDC is a high-frequency continuous write workload. Direct connections from the ingress tier can:
- Exhaust Aurora's native
max_connectionsunder burst. - Fail open on writer failover, dropping events.
A proxy like RDS Proxy:
- Pools a large number of short-lived client connections onto a smaller pool of long-lived backend connections.
- Transparently redirects traffic to the promoted replica on writer failure — CDC ingest keeps flowing.
- Supports IAM auth so credentials aren't in the NAT tier's environment.
Canonical production instance¶
sources/2026-04-21-aws-oldcastle-infor-aurora-quicksight-real-time-analytics — Oldcastle APG lands Infor Cloud ERP CDC events (NDJSON over HTTPS) into Aurora PostgreSQL JSONB, fronted by RDS Proxy, behind the patterns/nat-router-for-static-ip-ingress ingress shape. QuickSight dashboards query via relational SQL with SPICE caching for hot aggregates. 50+ dashboards; 100+ concurrent users; millions of transactions/day processed in real time.
Trade-offs¶
- Not a replacement for a lakehouse. Analytical workloads that scan petabytes belong on Iceberg/Hudi/Delta with a columnar engine — not JSONB on Postgres.
- Schema drift is a monitoring burden. JSONB absorbs whatever upstream emits; auditing for silent schema changes is the application's responsibility (see concepts/schema-evolution).
- Retention policy matters. JSONB grows unboundedly without TTL; a CDC stream landed into a relational buffer needs an archival / deletion policy or the table becomes a cost liability.
- Delivery semantics inherit from the source. If the source CDC stream is at-least-once, the relational buffer must be idempotent or de-duplicate.