Skip to content

CONCEPT Cited by 1 source

JSONB streaming buffer

JSONB streaming buffer is the two-phase schema discipline where a streaming-ingest pipeline lands arriving records into a PostgreSQL JSONB column — schemaless ingest — and then selectively normalizes only the fields the downstream query workload actually touches, either via expression indexes on JSONB paths or via materialized columns extracted at read time.

The pattern solves a tension that streaming CDC from third-party sources surfaces:

  • Ingest must be schema-tolerant. Upstream ERP / SaaS tables evolve without the downstream system's control. A rigid relational schema breaks on the next upstream column addition.
  • Query must be fast. Dashboards and ad-hoc analytics need indexed fields; scanning JSONB blobs per query is expensive.

JSONB resolves this by decoupling write-side flexibility from read-side performance: writes always succeed (the blob absorbs any upstream shape); reads can be backed by indexes that target only the subset of fields actually queried.

Mechanisms inside Postgres

  • Store the full event as JSONB. A single wide data JSONB column per event (or per record type) captures whatever upstream emits. Deletes are logical — op = 'delete' with a snapshot of the deleted row in the blob.
  • Index the hot fields. PostgreSQL supports:
  • Expression indexes on specific JSONB paths: CREATE INDEX ON events ((data->>'customer_id')).
  • GIN indexes on the whole JSONB column with jsonb_path_ops or the default jsonb_ops opclass, enabling containment queries (data @> '{"status": "shipped"}').
  • Generated columns that extract a typed value at write time (GENERATED ALWAYS AS ((data->>'ts')::timestamp) STORED) and can be indexed as regular columns.
  • Partial normalization — the most-queried fields get promoted to dedicated columns; everything else stays in JSONB.

Trade-offs

  • Schema-drift auditing is a new problem. JSONB absorbs any shape; detecting that upstream silently renamed a field or changed a type requires application-level monitoring (see concepts/schema-evolution).
  • Blob write amplification. Each JSONB value is TOAST-compressed and stored separately when > 2 KiB. Small-field updates may rewrite the whole TOASTed value.
  • Query-plan surprises. Optimizer statistics on JSONB fields extracted via expressions can be weaker than on regular columns; some join strategies may degrade.

Seen in

  • sources/2026-04-21-aws-oldcastle-infor-aurora-quicksight-real-time-analytics — canonical wiki instance. Oldcastle lands Infor Data Fabric CDC events into Aurora PostgreSQL JSONB columns, "storing it in JSONB columns for flexible querying while using the native JSON functions of Aurora PostgreSQL when we need to parse and normalize specific fields", with indexes "on frequently queried fields to maintain query performance as our data volume grows."
Last updated · 476 distilled / 1,218 read