SYSTEM Cited by 1 source
Zalando Postgres-sourced Event Streams¶
What it is¶
Zalando's low-code Postgres-sourced event-streaming platform — a shared internal capability that lets Zalando builders declare an event stream sourced from any Postgres table and have the platform provision a dedicated micro-application that publishes row-level change events downstream. Disclosed in Zalando's 2023-11-08 engineering post (sources/2023-11-08-zalando-patching-the-postgresql-jdbc-driver).
Architecture at the one-stream altitude¶
Each declared event stream gets its own micro-application that:
- Embeds Debezium Engine — the embedded-library mode of Debezium, hosting Debezium's Postgres source connector in the application's JVM.
- Talks to the upstream Postgres server via pgjdbc's logical-replication API, owning its own replication slot on the primary.
- Accepts arbitrary event transformations via AWS Lambda in the processing path.
- Publishes the transformed events to a configurable downstream — Kafka topics, direct HTTP sinks, SQS, or other internal event buses.
Scale at publication time (2023-11): hundreds of Postgres- sourced event streams in production.
Why the runaway-WAL-growth failure mode surfaces here¶
The platform's architectural shape — many independent low-code streams, each with its own replication slot, many attached to tables with low write traffic — is exactly the shape that exposes the runaway WAL growth bug in Postgres logical replication. A large fleet of low-traffic slots, each occasionally sharing a Postgres server with high-traffic tables owned by other streams or other applications, guarantees the asymmetric-table failure mode will occur somewhere in the fleet. The 2023-11 post is Zalando's retrospective on diagnosing and fixing that fleet-wide pain via an upstream pgjdbc patch (PR #2941).
Platform shape as a systems-design primitive¶
The shape — "declare a stream, get a micro-app" — is a canonical per-stream isolation deployment choice: one replication slot, one JVM, one Debezium Engine instance, one independent lifecycle. Contrast with a centralised Kafka-Connect-Debezium cluster hosting N connectors in a shared worker pool. The trade-off:
- Win: blast-radius isolation; one stream's failure does not affect other streams.
- Cost: N independent JVMs with N independent transitive dependency trees (see concepts/transitive-dependency-override-build — a fleet-wide upgrade of pgjdbc requires rebuilding every stream's Docker image).
Seen in¶
- sources/2023-11-08-zalando-patching-the-postgresql-jdbc-driver — canonical wiki introduction of Zalando's Postgres-sourced event-streaming platform. The primary framing of the post is Zalando's upstream pgjdbc patch for runaway-WAL growth, but the platform context establishes the systems-design shape: low-code event streams → per-stream micro-app → Debezium Engine → pgjdbc → Postgres logical replication → WAL slot pinning at scale.
Related¶
- companies/zalando — the owner.
- systems/postgresql — the source database.
- systems/debezium-engine — the CDC engine embedded in each stream's micro-app.
- systems/debezium — the parent Debezium project.
- systems/pgjdbc-postgres-jdbc-driver — the JDBC driver load-bearing in the platform stack.
- systems/aws-lambda — Lambda-hosted event transformations.
- systems/kafka — one of the possible downstream sinks.
- concepts/change-data-capture — the category.
- concepts/logical-replication — the Postgres mechanism each stream rides on.
- concepts/runaway-wal-growth — the failure mode the platform's architectural shape surfaces at fleet scale.