SYSTEM Cited by 2 sources
Zalando Postgres-sourced Event Streams¶
What it is¶
Zalando's low-code Postgres-sourced event-streaming platform — a shared internal capability that lets Zalando builders declare an event stream sourced from any Postgres table and have the platform provision a dedicated micro-application that publishes row-level change events downstream. Disclosed in Zalando's 2023-11-08 engineering post (sources/2023-11-08-zalando-patching-the-postgresql-jdbc-driver).
Architecture at the one-stream altitude¶
Each declared event stream gets its own micro-application that:
- Embeds Debezium Engine — the embedded-library mode of Debezium, hosting Debezium's Postgres source connector in the application's JVM.
- Talks to the upstream Postgres server via pgjdbc's logical-replication API, owning its own replication slot on the primary.
- Accepts arbitrary event transformations via AWS Lambda in the processing path.
- Publishes the transformed events to a configurable downstream — Kafka topics, direct HTTP sinks, SQS, or other internal event buses.
Scale at publication time (2023-11): hundreds of Postgres- sourced event streams in production.
Why the runaway-WAL-growth failure mode surfaces here¶
The platform's architectural shape — many independent low-code streams, each with its own replication slot, many attached to tables with low write traffic — is exactly the shape that exposes the runaway WAL growth bug in Postgres logical replication. A large fleet of low-traffic slots, each occasionally sharing a Postgres server with high-traffic tables owned by other streams or other applications, guarantees the asymmetric-table failure mode will occur somewhere in the fleet. The 2023-11 post is Zalando's retrospective on diagnosing and fixing that fleet-wide pain via an upstream pgjdbc patch (PR #2941).
Platform shape as a systems-design primitive¶
The shape — "declare a stream, get a micro-app" — is a canonical per-stream isolation deployment choice: one replication slot, one JVM, one Debezium Engine instance, one independent lifecycle. Contrast with a centralised Kafka-Connect-Debezium cluster hosting N connectors in a shared worker pool. The trade-off:
- Win: blast-radius isolation; one stream's failure does not affect other streams.
- Cost: N independent JVMs with N independent transitive dependency trees (see concepts/transitive-dependency-override-build — a fleet-wide upgrade of pgjdbc requires rebuilding every stream's Docker image).
Seen in¶
-
sources/2025-12-18-zalando-contributing-to-debezium-fixing-logical-replication-at-scale — second canonical wiki ingest, two-year retrospective with updated scale numbers. Platform is now running at "hundreds of event streams" processing "hundreds of thousands of events per second" across Zalando's 100+ Kubernetes clusters at peak, with billions of events processed since 2018. The post discloses two architectural choices that characterise the platform structurally: (a)
MemoryOffsetBackingStore— ephemeral by design — making the Postgres replication slot the sole authoritative position-tracking source; (b) Patroni- managed Postgres HA with replication slot management that ensures slots survive failovers since day one of logical replication rollout in late 2018. Together these make Zalando's deployment the canonical instance of slot-authoritative position tracking — a posture the 2025-12 Debezium 3.4 contributions (lsn.flush.mode+offset.mismatch.strategy) make explicitly opt-in for deployments with matching invariants. The post also canonicalises the platform's pre-2.7.4 + pgjdbc-42.7.2 production run — nearly two years, billions of events, zero detected data loss — as the empirical basis for Zalando continuing to advocate for the pgjdbc keepalive-flush feature even after Debezium disabled it globally. -
sources/2023-11-08-zalando-patching-the-postgresql-jdbc-driver — canonical wiki introduction of Zalando's Postgres-sourced event-streaming platform. The primary framing of the post is Zalando's upstream pgjdbc patch for runaway-WAL growth, but the platform context establishes the systems-design shape: low-code event streams → per-stream micro-app → Debezium Engine → pgjdbc → Postgres logical replication → WAL slot pinning at scale.
Related¶
- companies/zalando — the owner.
- systems/postgresql — the source database.
- systems/debezium-engine — the CDC engine embedded in each stream's micro-app.
- systems/debezium — the parent Debezium project.
- systems/pgjdbc-postgres-jdbc-driver — the JDBC driver load-bearing in the platform stack.
- systems/aws-lambda — Lambda-hosted event transformations.
- systems/kafka — one of the possible downstream sinks.
- concepts/change-data-capture — the category.
- concepts/logical-replication — the Postgres mechanism each stream rides on.
- concepts/runaway-wal-growth — the failure mode the platform's architectural shape surfaces at fleet scale.