Skip to content

SYSTEM Cited by 1 source

Zalando Postgres-sourced Event Streams

What it is

Zalando's low-code Postgres-sourced event-streaming platform — a shared internal capability that lets Zalando builders declare an event stream sourced from any Postgres table and have the platform provision a dedicated micro-application that publishes row-level change events downstream. Disclosed in Zalando's 2023-11-08 engineering post (sources/2023-11-08-zalando-patching-the-postgresql-jdbc-driver).

Architecture at the one-stream altitude

Each declared event stream gets its own micro-application that:

  1. Embeds Debezium Engine — the embedded-library mode of Debezium, hosting Debezium's Postgres source connector in the application's JVM.
  2. Talks to the upstream Postgres server via pgjdbc's logical-replication API, owning its own replication slot on the primary.
  3. Accepts arbitrary event transformations via AWS Lambda in the processing path.
  4. Publishes the transformed events to a configurable downstream — Kafka topics, direct HTTP sinks, SQS, or other internal event buses.

Scale at publication time (2023-11): hundreds of Postgres- sourced event streams in production.

Why the runaway-WAL-growth failure mode surfaces here

The platform's architectural shape — many independent low-code streams, each with its own replication slot, many attached to tables with low write traffic — is exactly the shape that exposes the runaway WAL growth bug in Postgres logical replication. A large fleet of low-traffic slots, each occasionally sharing a Postgres server with high-traffic tables owned by other streams or other applications, guarantees the asymmetric-table failure mode will occur somewhere in the fleet. The 2023-11 post is Zalando's retrospective on diagnosing and fixing that fleet-wide pain via an upstream pgjdbc patch (PR #2941).

Platform shape as a systems-design primitive

The shape — "declare a stream, get a micro-app" — is a canonical per-stream isolation deployment choice: one replication slot, one JVM, one Debezium Engine instance, one independent lifecycle. Contrast with a centralised Kafka-Connect-Debezium cluster hosting N connectors in a shared worker pool. The trade-off:

  • Win: blast-radius isolation; one stream's failure does not affect other streams.
  • Cost: N independent JVMs with N independent transitive dependency trees (see concepts/transitive-dependency-override-build — a fleet-wide upgrade of pgjdbc requires rebuilding every stream's Docker image).

Seen in

  • sources/2023-11-08-zalando-patching-the-postgresql-jdbc-drivercanonical wiki introduction of Zalando's Postgres-sourced event-streaming platform. The primary framing of the post is Zalando's upstream pgjdbc patch for runaway-WAL growth, but the platform context establishes the systems-design shape: low-code event streams → per-stream micro-app → Debezium Engine → pgjdbc → Postgres logical replication → WAL slot pinning at scale.
Last updated · 501 distilled / 1,218 read