PATTERN Cited by 1 source

AIMD ingestion-rate control¶

Problem¶

An event-driven or message-broker-fed system has downstream processing capacity that is bounded (by external providers, DB IOPS, CPU, thread pools) and ingress that is unbounded (events arrive whenever publishers choose). When ingress outpaces processing:

Internal queues grow, broker performance degrades, all traffic degrades uniformly.
High-priority work queues behind low-priority work (priority inversion).
Emergency shedding of low-priority work requires expensive distributed deletes if the backlog has already spread across internal queues.

Static rate limits don't work: the downstream's true capacity is opaque and changes (e.g. autoscaling, partial failures), so any fixed ingress cap is either too slow (wastes capacity) or too fast (overruns it).

Solution¶

Apply TCP's Additive Increase Multiplicative Decrease (AIMD) algorithm to the ingestion-rate variable at the point of ingress — the consumer bridging a durable source (event bus, message queue) to the internal processing pipeline. Each ingestion-class throttle holds a rate state variable; an observed congestion signal (error rate, downstream latency, publish-time growth) drives updates:

Not congested → rate += a — probe gently for spare capacity.
Congested → rate *= b (with 0 < b < 1) — relinquish sharply so the system recovers.

The algorithm inherits TCP's attractive properties: converges to fairness/efficiency without a central coordinator, adapts to downstream capacity changes, uses local state only, and degrades gracefully (concepts/additive-increase-multiplicative-decrease-aimd).

Zalando's instantiation¶

The 2024 Zalando communications-platform post (Source: sources/2024-04-22-zalando-enhancing-distributed-system-load-shedding-with-tcp-congestion-control-algorithm) is the canonical wiki instance:

Ingress boundary: Nakadi event bus → RabbitMQ internal broker.
Throttle location: Stream Consumer microservice; one Throttle instance per Nakadi event type (1,000+ event types in production).
Rate variable: per-event-type consumption batch size from Nakadi.
Congestion signal: RabbitMQ P50 publish latency
publish-exception count, combined (concepts/publish-latency-as-congestion-signal).
Coefficients: per-priority (concepts/per-priority-aimd-coefficients) — P1 gets +15 / × 0.8, P3 gets +5 / × 0.4.
Overflow buffer: un-consumed events sit on Nakadi (patterns/source-queue-as-overflow-buffer).
Coordination: none — "there is no coordination between different throttles!".

Structural requirements¶

For this pattern to apply, the system must provide:

A durable source that holds un-consumed work — Kafka, Nakadi, Kinesis, SQS. HTTP ingress does not qualify (returns must be synchronous; 429 + client retry is a different pattern).
A congestion signal visible at ingress. Publish latency, publish errors, 5xx rate, internal queue depth, or a measured downstream saturation metric (patterns/multi-metric-throttling for combining multiple).
A controllable ingestion variable. Batch size, concurrency, request-rate — whatever the ingress layer can adjust without a deploy.
Observed-to-actuation latency shorter than saturation-damage accumulation. If the signal updates every 60s and the broker fills in 30s, the loop is too slow.

Pseudo-implementation¶

# Collected once per tick by a Statistics Collector.
signal = measure_congestion()      # e.g. P50_latency, err_count

# Broadcast via observer to all Throttles.
congested = (signal.p50 > latency_threshold or
             signal.errors > error_threshold)

# Per-class Throttle applies its own coefficients.
for throttle in throttles:
    if congested:
        throttle.rate *= throttle.decrease_coefficient
    else:
        throttle.rate += throttle.increase_coefficient
    # Optional: clamp to [min_floor, max_ceiling] to prevent
    # starvation / runaway growth.

Variants¶

Single-class. Uniform increase/decrease coefficients → converges to fair shares across competing flows (TCP's original application; internet-scale).
Priority- differentiated. Per-class coefficients; converges preferentially toward high-priority classes. Zalando's instantiation.
Multi-metric. Signal is a combination (AND / OR / weighted) of multiple observables; see patterns/multi-metric-throttling.

Consequences¶

Capacity is probed, not assumed. When downstream scales out, AIMD discovers it via the additive-increase phase.
Emergency shedding = source-side configuration. Un-consumed work stays on the durable source; discarding is a cursor-advance or retention-truncate, not a distributed delete.
Threshold tuning is delicate. Too-tight → unnecessary throttling in healthy state; too-loose → broker floods before throttling engages. This is the pattern's standing operational cost.
Simplified vs full cwnd. The pattern typically uses only additive-increase + multiplicative-decrease, not the full TCP cwnd machinery (no slow-start, no timeout- based reset). This is intentional for application-layer use: simpler to reason about, sufficient for the regime where the controller runs slower than packet-level cwnd.

patterns/shed-load-during-capacity-shortage — generic "drop work when overloaded" at any layer; AIMD ingestion-rate control is the specific ingestion-boundary continuous-adaptation version.
patterns/explicit-backpressure-policy — stream-API-level; a producer chooses block/drop/error. AIMD is the control-loop that drives the signal on which backpressure decisions are made.
Static rate limiting (per-client rpm / qps cap) — doesn't adapt. AIMD replaces it when the downstream's capacity is opaque or variable.

Seen in¶

Zalando — Enhancing Distributed System Load Shedding with TCP Congestion Control Algorithm (2024-04-22) — canonical pattern instance. Stream Consumer at the Nakadi → RabbitMQ boundary; per-event-type Throttle instances; publish-latency + publish-errors congestion signal; per-priority coefficients; ~6 months in production with flat order-confirmation processing time through load episodes.