CONCEPT

Publish latency as congestion signal¶

Definition¶

Publish latency as congestion signal is the use of a producer-observed message-broker publish time — typically the P50 (median) of the round-trip from producer publish() to broker ack — as the input signal to a rate-control loop at the producer. When publish latency rises above a threshold, the producer infers "the downstream is saturated" and slows down; when it falls, the producer infers "capacity available" and speeds up.

Why it works: what publish latency actually measures¶

A producer's publish-round-trip isn't one mechanism — it's the composition of several saturation-sensitive ones into a single number:

Broker ingress queue depth. The broker accepts publishes into an in-memory structure before flushing to disk / replicating. Under load this queue grows; publishes wait longer to be acked.
Broker-internal backpressure. Many brokers (RabbitMQ specifically) actively slow publishers when consumers are lagging. RabbitMQ's flow-control mechanism does this by pausing or pacing the publisher's channel — the publisher experiences it as "my publish call took longer."
Downstream pipeline saturation (via the broker). Because the broker slows the publisher when consumers lag, publish latency transitively reflects consumer- side saturation — not just broker-internal saturation. This is the property that makes publish latency such a useful fused signal.

Publish latency therefore collapses three distinct saturation regimes into one measurement. An AIMD loop keyed on it reacts to whichever is active.

Why P50 and not P99¶

The signal is used to drive rate adaptation, not alerting. What matters is "is the typical publish path saturating?" not "are there any slow publishes?" Tail metrics (P99/P99.9) are dominated by transient outliers — GC pauses, single-broker hiccups, lock contention on one queue — and a rate loop keyed on P99 would chatter. P50 is a more policy-actionable number: when the median rises, something systemic is happening.

Complementary signals¶

Publish latency alone isn't a complete congestion picture:

Publish errors / exceptions. When the broker hits a hard limit — disk full, memory alarm — publishes fail outright rather than just slow. The error count is a separate signal that should also feed the rate loop (patterns/multi-metric-throttling).
Queue depth on the broker side. If the producer can read broker queue depth directly, it's a more direct saturation measure than publish latency. Most brokers don't expose this on the publish path, though, so latency is the practical proxy.

Seen in¶

— Zalando's Statistics Collector cron job samples P50 RabbitMQ publish latency plus publish exception count. The Congestion Detector compares each to configured thresholds and emits a binary congestion decision, which drives per-event-type AIMD throttles. Publish latency is explicitly called out as capturing RabbitMQ's own flow-control–induced slowdown: "RabbitMQ is able to apply back-pressure when slow consumers are detected... In this case RabbitMQ will slow down the publish rate which the publisher will experience in the increase in the publish time."

concepts/additive-increase-multiplicative-decrease-aimd — the canonical consumer of this signal.
concepts/backpressure — what the signal is a proxy for.
concepts/congestion-window — TCP's analogous signal (ACK delay / packet loss) driving cwnd updates.
patterns/multi-metric-throttling — the pattern for combining latency + error-rate + queue-depth.
patterns/aimd-ingestion-rate-control
systems/rabbitmq