Skip to content

CONCEPT Cited by 1 source

Load shedding at ingestion

Definition

Load shedding at ingestion is the architectural principle of refusing (or slowing) work at the boundary where it enters the system, rather than letting it flow in and drop or delay it deep inside. The boundary is whichever component translates an external trigger (an HTTP request, an event from a bus, a scheduled job) into internal work — gateway, broker consumer, scheduler.

Why at the boundary

Three compounding problems arise when shedding happens inside the system rather than at the edge:

  1. Wasted internal work. Every service that touched a request before it was dropped burned CPU, memory, DB connections, and network bandwidth for nothing. Under load — the exact scenario where shedding is needed — this wasted work can cascade and destabilise services that weren't themselves overloaded.
  2. State debris. Partial processing (half-written rows, enqueued child tasks, emitted side-events) has to be cleaned up or survived. Systems that drop work mid-pipeline grow long tails of "what if the drop happened right after X but before Y" cases.
  3. Priority inversion. A packed internal queue processes things FIFO by default. Low-priority work that arrived earlier blocks high-priority work that arrived later. Shedding at the edge lets admission control express priority before items enter any shared queue.

Shedding at ingestion flips each of these: work never started is work that doesn't need to be cleaned up, wasted, or reordered.

The problem it solves that internal shedding can't

The canonical motivating observation, from Zalando's 2024 communications-platform post (Source: sources/2024-04-22-zalando-enhancing-distributed-system-load-shedding-with-tcp-congestion-control-algorithm):

"In general, it's much better to control how much traffic is ingested into your system from the source, rather than letting it flood the system and then trying to deal with it."

Zalando's platform consumed 1,000+ Nakadi event types and pushed them into RabbitMQ regardless of downstream capacity. Under load, RabbitMQ queue depths ballooned; queue depth on RabbitMQ degrades broker performance per RabbitMQ's own operational guidance. Any shedding done after that point — inside individual microservices — couldn't recover the broker, and couldn't preserve high-priority traffic that was already stuck behind low-priority messages in the same queues.

The fix was to move admission control to the Stream Consumer (the Nakadi → RabbitMQ bridge) and make that consumer's rate responsive to downstream saturation signals via AIMD.

Requirements for ingestion-level shedding to work

  • A congestion signal is available at the ingest point. Error rate, downstream latency, queue depth — something the throttle can read. The publish-latency signal is one canonical choice.
  • The upstream can hold what isn't consumed. If the ingest point refuses work but the upstream can't buffer, the shedding just moves the problem one hop back. Durable event buses (Kafka, Nakadi, Kinesis) satisfy this naturally via their retention; HTTP doesn't, so HTTP ingest shedding returns 429 and relies on the client to retry (patterns/source-queue-as-overflow-buffer is the pattern that formalises this requirement).
  • Admission decisions can express priority. One admit/reject rule for all classes defeats the purpose; per-class rules (patterns/priority-differentiated-load-shedding) are what let ingestion-level shedding preserve SLO-protected traffic while shedding the rest.

Seen in

Last updated · 501 distilled / 1,218 read