SYSTEM Cited by 1 source
Zalando Stream Consumer¶
Definition¶
The Stream Consumer is the microservice inside the Zalando Communication Platform that sits at the Nakadi → RabbitMQ boundary. It is the single entry point through which events enter the platform: it consumes events from Nakadi, performs lightweight processing, and publishes the resulting work to RabbitMQ for downstream microservices (renderers, consent / preference / blocklist checkers, eligibility checkers, template stores) to consume.
Each Nakadi event type is handled by its own Event Listener instance inside the Stream Consumer. The Stream Consumer is also the admission-control chokepoint where the platform implements load-shedding via AIMD.
The three internal components¶
The Stream Consumer's load-shedding design, as described in the 2024 post (Source: sources/2024-04-22-zalando-enhancing-distributed-system-load-shedding-with-tcp-congestion-control-algorithm), is a three-component observer-pattern pipeline:
1. Statistics Collector¶
A cron-scheduled job that samples RabbitMQ publish-side observables:
- P50 publish latency — the round-trip time from
producer
publish()to broker ack. - Publish exception count — the number of failed publishes in the sampling window.
It pushes these to the Congestion Detector.
2. Congestion Detector¶
Compares the collector's output against configured thresholds and emits a binary "congested / not congested" decision. Uses the observer pattern to notify all Throttle instances of the decision simultaneously — one shared signal, broadcast.
This two-signal input (latency + errors) is the canonical realization of patterns/multi-metric-throttling in the Communication-Platform instance: publish latency captures slow-path saturation including RabbitMQ's own flow-control mechanism, while the exception count captures hard-limit failures.
3. Throttle (per Event Listener)¶
One Throttle instance per Nakadi event type. Each holds its current batch-size state variable and is seeded with priority- specific coefficients on construction. On each Congestion Detector notification:
- Not-congested tick:
batch_size += increase[priority](additive increase). - Congested tick:
batch_size *= decrease[priority](multiplicative decrease).
Coefficients from the 2024 post:
| Priority | Increase | Decrease |
|---|---|---|
| P1 | + 15 |
× 0.80 |
| P2 | + 10 |
× 0.60 |
| P3 | + 5 |
× 0.40 |
No coordination between Throttle instances — each uses only its local state. The 2024 post is explicit: "there is no coordination between different throttles!". The prioritization emerges from the shared congestion signal + asymmetric coefficients, not from a central scheduler.
Nakadi-specific implementation detail¶
Nakadi's subscription API does not natively support changing consumption batch size at runtime. The Stream Consumer simulates dynamic rate control by pausing and resuming consumption to approximate the Throttle's current target rate. This is a practical quirk of Nakadi / Kafka-client semantics that other brokers (e.g. Kinesis, pull-based consumers) wouldn't inherit.
Why this shape works¶
- Scoped change. The entire admission-control change fits inside one microservice — no downstream service modification, no new cross-service contract, no shared database.
- Observer pattern + local state. Broadcast the decision; let each throttle decide locally. Classic control- plane / data-plane split inside a single service.
- Scales with event types. Adding a new event type means adding a new Throttle with priority-coefficient assignment; no capacity re-computation.
Seen in¶
- Zalando — Enhancing Distributed System Load Shedding with TCP Congestion Control Algorithm (2024-04-22) — canonical architecture post describing the Statistics Collector + Congestion Detector + per-event-type Throttle design with observer-pattern wiring, and the per-priority coefficient table. The Stream Consumer's pausing/resuming approximation of rate control is called out as a concrete consequence of Nakadi not supporting dynamic batch sizes natively.
Related¶
- systems/zalando-communication-platform — host system.
- systems/nakadi — ingress (consumption) side.
- systems/rabbitmq — egress (publish) side.
- concepts/additive-increase-multiplicative-decrease-aimd
- concepts/publish-latency-as-congestion-signal
- concepts/per-priority-aimd-coefficients
- concepts/load-shedding-at-ingestion
- patterns/aimd-ingestion-rate-control
- patterns/multi-metric-throttling
- patterns/priority-differentiated-load-shedding
- patterns/source-queue-as-overflow-buffer
- companies/zalando