CONCEPT Cited by 1 source

Load shedding at ingestion¶

Definition¶

Load shedding at ingestion is the architectural principle of refusing (or slowing) work at the boundary where it enters the system, rather than letting it flow in and drop or delay it deep inside. The boundary is whichever component translates an external trigger (an HTTP request, an event from a bus, a scheduled job) into internal work — gateway, broker consumer, scheduler.

Why at the boundary¶

Three compounding problems arise when shedding happens inside the system rather than at the edge:

Wasted internal work. Every service that touched a request before it was dropped burned CPU, memory, DB connections, and network bandwidth for nothing. Under load — the exact scenario where shedding is needed — this wasted work can cascade and destabilise services that weren't themselves overloaded.
State debris. Partial processing (half-written rows, enqueued child tasks, emitted side-events) has to be cleaned up or survived. Systems that drop work mid-pipeline grow long tails of "what if the drop happened right after X but before Y" cases.
Priority inversion. A packed internal queue processes things FIFO by default. Low-priority work that arrived earlier blocks high-priority work that arrived later. Shedding at the edge lets admission control express priority before items enter any shared queue.

Shedding at ingestion flips each of these: work never started is work that doesn't need to be cleaned up, wasted, or reordered.

The problem it solves that internal shedding can't¶

The canonical motivating observation, from Zalando's 2024 communications-platform post (Source: ):

"In general, it's much better to control how much traffic is ingested into your system from the source, rather than letting it flood the system and then trying to deal with it."

Zalando's platform consumed 1,000+ Nakadi event types and pushed them into RabbitMQ regardless of downstream capacity. Under load, RabbitMQ queue depths ballooned; queue depth on RabbitMQ degrades broker performance per RabbitMQ's own operational guidance. Any shedding done after that point — inside individual microservices — couldn't recover the broker, and couldn't preserve high-priority traffic that was already stuck behind low-priority messages in the same queues.

The fix was to move admission control to the Stream Consumer (the Nakadi → RabbitMQ bridge) and make that consumer's rate responsive to downstream saturation signals via AIMD.

Requirements for ingestion-level shedding to work¶

A congestion signal is available at the ingest point. Error rate, downstream latency, queue depth — something the throttle can read. The publish-latency signal is one canonical choice.
The upstream can hold what isn't consumed. If the ingest point refuses work but the upstream can't buffer, the shedding just moves the problem one hop back. Durable event buses (Kafka, Nakadi, Kinesis) satisfy this naturally via their retention; HTTP doesn't, so HTTP ingest shedding returns 429 and relies on the client to retry (patterns/source-queue-as-overflow-buffer is the pattern that formalises this requirement).
Admission decisions can express priority. One admit/reject rule for all classes defeats the purpose; per-class rules (patterns/priority-differentiated-load-shedding) are what let ingestion-level shedding preserve SLO-protected traffic while shedding the rest.

Seen in¶

— canonical statement of the principle; the post's closing paragraph frames it as design guidance: "it's much better to control how much traffic is ingested into your system from the source, rather than letting it flood the system and then trying to deal with it." The Zalando implementation uses AIMD at the Nakadi consumer to choose ingest rate, with Nakadi as the overflow buffer and per-priority coefficients for differentiated shedding.
Zalando — The Day Our Own Queries DoS'ed Us (2025-12-16) — wiki instance of load shedding at the presentation- layer boundary rather than at a message-bus boundary. During the self-inflicted-DoS incident on Elasticsearch, Zalando's Search & Browse team used Catalog API and Search API as a control plane to reduce downstream load: "we turned off non-critical calls, reduced the number of parallel queries per request, and increased cache effectiveness for hot queries and filter combinations. Our search steering configuration also played a key role here: we lowered the load generated by search by sampling fewer requests into some heavier ML model integrations and promotion-enrichment flows, falling back to simpler ranking where needed." The substrate is HTTP-over-services rather than Kafka/Nakadi, but the structural principle is identical — shed at the boundary nearest the user intent, not inside the already-saturated dependency. See patterns/application-side-query-limit-with-dynamic-threshold for the specific cost-based-rejection refinement.

concepts/backpressure — ingestion-level shedding is backpressure made visible at the system boundary.
concepts/additive-increase-multiplicative-decrease-aimd
concepts/publish-latency-as-congestion-signal
concepts/self-inflicted-dos — the canonical failure mode that app-side load shedding is a defence against
patterns/aimd-ingestion-rate-control
patterns/source-queue-as-overflow-buffer
patterns/priority-differentiated-load-shedding
patterns/application-side-query-limit-with-dynamic-threshold
patterns/split-cluster-by-market-for-load-isolation
concepts/event-driven-architecture
concepts/critical-business-operation