PATTERN Cited by 1 source
Source queue as overflow buffer¶
Problem¶
When an ingestion layer throttles to protect a downstream from overload, the un-consumed work has to go somewhere. Three bad places to put it:
- The ingestion service's own memory. In-process buffering under sustained load leads to OOM or GC collapse.
- An auxiliary spill queue. A separate queue introduced to absorb overflow adds operational surface, a new failure domain, and a second retention policy to manage.
- The internal work broker (RabbitMQ, DB queue). Moving backlog into a broker whose performance degrades with queue depth (RabbitMQ is the canonical example; most broker-operational guidance says "keep queues short") directly contradicts the reason for throttling.
A fourth option — drop the work — is either unacceptable (the work matters) or acceptable only in emergencies.
Solution¶
Don't consume at all when throttled. Let un-consumed work sit on the durable source — the upstream event bus, message queue, or log — that the ingestion layer reads from. The source was already designed to hold events durably; its retention policy covers the backlog; it already has the monitoring, replication, and operational story. No new infrastructure.
The ingestion layer's job becomes "decide when and how fast to read" rather than "buffer what we couldn't process".
Why durable event buses enable this¶
- Kafka / Nakadi / Kinesis / SQS all have retention policies (hours to days) that cover short-to-medium load episodes naturally.
- Offset / cursor semantics let the consumer pause without losing position; resuming is idempotent.
- Partitioned log structure means backlog growth is a disk-space concern, not a performance concern — a Kafka partition with 100 GB of backlog still publishes at line rate.
- Retention is priced per-GB-disk, cheap compared to operating an extra broker.
HTTP ingress does not provide this property: the "source" is the client, and there's no offset the server can pause against. HTTP ingress shedding therefore requires 429 + client-retry, which is a different pattern.
Zalando's canonical instance¶
The 2024 Zalando communications-platform post (Source: sources/2024-04-22-zalando-enhancing-distributed-system-load-shedding-with-tcp-congestion-control-algorithm) is the canonical realization:
- Source: Nakadi (Kafka-backed, durable, per-event-type subscriptions) carrying 1,000+ event types.
- Ingest layer: Stream Consumer with per-event-type Throttle instances.
- Internal broker: RabbitMQ.
- Overflow location: Nakadi partitions / subscriptions themselves — un-consumed events stay there until the Throttle speeds back up.
The post is explicit about the alternative not being chosen:
"It also helped us to avoid using the RabbitMQ cluster as a storage for millions of messages — with a smaller queue size in RabbitMQ we follow best practices."
The ~300K-message Nakadi backlog screenshot with near-empty RabbitMQ queues is the direct operational demonstration.
Emergency shedding gets easier¶
A load-shedding story isn't complete without a plan for "we're going to be overloaded longer than our capacity can catch up — discard the lowest-priority work." When backlog sits on the durable source:
- Discard = cursor advance (or retention-truncate, or subscription reset). One source-side configuration change per priority class.
- No distributed delete across internal queues.
- No compensating work to clean up partial state — nothing has been processed yet.
Contrast: if the backlog had been drained into RabbitMQ queues, emergency discard means "delete N messages from each of M queues spanning K microservices."
Prerequisites¶
- Durable source with retention ≥ worst-case shedding episode. If the backlog exceeds retention, events are silently lost. Retention sizing becomes part of the shedding design.
- Resumable consumption. The source must support pausing + resuming a consumer without losing position. Standard for Kafka / Nakadi / Kinesis; harder for broadcast-style (UDP multicast, SNS fanout) sources.
- Backpressure-aware consumer. The consumer must be able to not consume — not all client libraries make this easy; some poll constantly and require explicit throttling logic.
Variants¶
- Single source, many partitions (Kafka / Nakadi default) — each partition can be paused independently; priority is expressed via subscription configuration.
- Source as durable relay (transactional-outbox pattern) — the source is populated by an upstream commit-then-publish mechanism, with the source itself becoming the durable boundary for downstream. Pairs with patterns/transactional-outbox.
- SQS-as-source — similar retention properties but pull-based + per-message visibility timeout rather than log-partitioned offsets.
Anti-patterns¶
- Adding an overflow queue between source and consumer. A "spill queue" introduces a second retention policy, a second monitoring setup, and a new failure domain. The source was already durable.
- Buffering in the consumer process. In-memory buffering turns a backpressure problem into an OOM problem. Even bounded in-memory buffers (ring buffers, disk-backed queues) add operational surface without adding the source's existing durability guarantees.
- Moving backlog into the internal broker (the pattern's inversion). If RabbitMQ queues grow, the system has already lost the benefit of this pattern.
Seen in¶
- Zalando — Enhancing Distributed System Load Shedding with TCP Congestion Control Algorithm (2024-04-22) — canonical realization. Stream Consumer's AIMD throttles leave un-consumed events on Nakadi; the post explicitly contrasts this with the alternative of pushing them into RabbitMQ. The 300K-message P3-backlog on Nakadi + small RabbitMQ queues is the operational demonstration. Emergency shedding is framed as "Messages of lower priority can be discarded in case of emergency." — a source-side operation.
Related¶
- concepts/load-shedding-at-ingestion — the principle this pattern operationalizes.
- concepts/additive-increase-multiplicative-decrease-aimd — the control loop that decides when to not-consume.
- concepts/backpressure — what the not-consuming signal expresses.
- concepts/event-driven-architecture — the design style this pattern is native to.
- patterns/aimd-ingestion-rate-control — canonical composition partner.
- patterns/priority-differentiated-load-shedding — what allows per-class emergency discard.
- systems/nakadi — the canonical source instance.
- systems/rabbitmq — the canonical broker that benefits from staying un-overflowed.
- systems/kafka — the durable-log substrate that makes this pattern possible.