Skip to content

PATTERN Cited by 1 source

Priority-differentiated load shedding

Problem

Uniform load shedding — "drop N% of incoming work under pressure" — treats all work as equally valuable. Production systems rarely have this property. Some traffic is SLO-protected (order confirmations, auth, payments, incident communications) and some is bulk / deferrable (marketing pushes, analytics ingestion, recommendation refreshes).

Under a shared capacity constraint, uniform shedding:

  • Breaks SLOs gratuitously. A 20% shed cuts critical traffic by 20% despite ample capacity for it if the bulk traffic yielded first.
  • Discards business value disproportionately. Order-confirmation latency affects customer trust and revenue directly; marketing-push latency rarely does.
  • Misses the actual design question. The business question isn't "what's our shed policy?" — it's "whose work gets done first when we're squeezed?".

Solution

Run shedding with asymmetric per-class parameters so that, under load, lower-priority classes yield capacity first and most, and released capacity is automatically re-allocated to still-active higher-priority classes.

The canonical realization pairs this pattern with AIMD (patterns/aimd-ingestion-rate-control) but the principle applies to any shedding mechanism:

  • Larger budget increases for high-priority classes. P1 recovers capacity faster than P3.
  • Smaller budget decreases for high-priority classes. P1 loses less than P3 on each congestion tick.
  • Shared congestion signal. All classes see the same "system is overloaded" decision; the asymmetry is in how they react.
  • No inter-class coordination. The emergent re-allocation comes from the arithmetic, not from explicit scheduling.

Coefficient table (canonical Zalando instance)

From the 2024 Zalando communications-platform post (Source: sources/2024-04-22-zalando-enhancing-distributed-system-load-shedding-with-tcp-congestion-control-algorithm):

Priority Additive increase Multiplicative decrease
P1 (critical: order confirmations) + 15 × 0.80 (−20%)
P2 (normal) + 10 × 0.60 (−40%)
P3 (bulk: commercial messages) + 5 × 0.40 (−60%)

Reading the rows:

  • On a not-congested tick: P1 climbs 3× faster than P3.
  • On a congested tick: P3 loses 60% of its rate while P1 loses 20%.

Over a load episode with alternating ticks, the gap widens monotonically: P1 hovers near pre-episode throughput while P3 collapses to a small fraction of it. When the load episode clears, P3 recovers slowly while P1 is already running at near-max.

Why the arithmetic re-allocates capacity

The system has a shared physical capacity (downstream throughput). When P3's rate shrinks by 60%, the capacity it released doesn't sit idle — the additive-increase probes of still-large P1 classes claim it on the next not-congested tick. Over a few tick cycles:

Initial (fair) :   P1 = 100, P2 = 100, P3 = 100
After some congestion ticks:
                   P1 ≈ 80,  P2 ≈ 40,  P3 ≈ 20
Total capacity is still ≈140 (was ≈300); the split is now
~57/29/14 instead of 33/33/33.

The system has converted the fairness property of AIMD into a priority-weighted allocation, deliberately breaking fairness because the business doesn't want fairness here.

Prerequisites

  • A defined priority taxonomy. Discrete classes (typically 3-5), not continuous priorities — the coefficient table becomes unwieldy beyond that. Per-event-type priority assignment maintained by the domain team that owns the event type.
  • A shared congestion signal at the shedding point.
  • A per-class rate/budget state variable the shedder can mutate independently.
  • Per-class floors and ceilings. Otherwise:
  • P3 can collapse to 0 and never recover (starvation).
  • P1 can grow unboundedly during calm periods and starve P2/P3 once a load episode clears.

Coefficient-table design guidance

  • Ratio, not absolute magnitude. Doubling all additive constants is roughly equivalent to halving the tick frequency. What matters is the relative magnitudes across classes.
  • Ratios that don't collapse quickly. If P1's decrease is 0.95 and P3's is 0.10, the gap between them after one congestion tick is already an order of magnitude — probably too sharp. Zalando's 0.8 / 0.6 / 0.4 is a gentler factor-of-2-ish progression.
  • Asymmetric between increase and decrease sides. The increase side is probing (small steps); the decrease side is reacting (large steps). Zalando's table has P1-decrease at 0.8 while P1-increase at +15 — the decrease is more aggressive per tick than the increase. Standard AIMD property.

Anti-patterns

  • Strict priority scheduling. Always admit P1, then P2, then P3 until capacity exhausted, never mixing. Starves lower classes completely and produces sharp cliffs at the capacity boundary. Priority-differentiated shedding is the soft priority — classes share capacity, weighted.
  • Per-request priority override. "Tag this marketing push as P1 just this once." Defeats the priority system; priority should be per-event-type, not per-event.
  • Priority inflation. If everything becomes P1, the table does nothing. The canonical defence is operational review of priority assignments and floor/ceiling enforcement per class.

Seen in

Last updated · 550 distilled / 1,221 read