Skip to content

SYSTEM Cited by 1 source

RabbitMQ

RabbitMQ is an open-source message broker implementing AMQP 0-9-1 (and native extensions for MQTT / STOMP / Streams), originally from Rabbit Technologies (now VMware). Widely deployed as a general-purpose durable work-queue / pub-sub fabric with rich exchange-and-binding routing, per-consumer acknowledgement, dead-letter queues, and native clustering.

Properties relevant to system design

  • Exchange + binding routing — direct / topic / fanout / headers exchange types; flexible many-to-many producer-consumer wiring.
  • Push-based delivery — the broker pushes messages to consumers holding unfilled prefetch windows; consumers don't peek, they receive.
  • Prefetch = request-count window — a consumer's prefetch_count (or prefetch_size in bytes) is the classic knob for how many messages may be in-flight un-acked. Not a token-count or payload-attribute budget.
  • Per-message ack / nack with redelivery — at-least-once delivery; redelivery on nack or consumer disconnect.
  • Quorum queues / mirrored queues / Streams — durability and high-availability modes.

Role on the wiki

RabbitMQ is a canonical example of a message broker whose native batching knobs are not sufficient for application-specific compute-batching disciplines such as token-count batching for GPU inference. Three specific gaps named in the 2025-12-18 Voyage AI post:

  1. Request-count prefetch — doesn't compose with a per-payload token count.
  2. Push delivery — consumers can't peek and selectively claim by a caller-computed budget.
  3. No atomic peek + conditional claim primitive — the token-count-batching scheduler's required single-step operation doesn't exist.

Both practical workarounds are on the wiki as separate patterns:

Seen in

  • 2025-12-18 Voyage AI / MongoDB — Token-count-based batching"RabbitMQ's prefetch is request-count-based, and messages are pushed to consumers, so there's no efficient way to peek and batch requests by Σ token_count_i." Named alongside Kafka as the two general-purpose brokers whose batching semantics don't fit token-count batching natively. (sources/2025-12-18-mongodb-token-count-based-batching-faster-cheaper-embedding-inference)
  • 2024-04-22 Zalando — Enhancing Distributed System Load Shedding with TCP Congestion Control AlgorithmRabbitMQ as the publisher-saturation signal substrate + internal backbone of the Zalando Communication Platform. The platform routes customer-communication work (order confirmations, marketing pushes, brand alerts) through RabbitMQ between its microservices; under load the broker's own flow-control slows publishers, which Zalando reads via P50 publish latency + publish-exception count and uses as the input to per-event-type AIMD throttles at the Stream Consumer. The post explicitly cites the operational rationale — "with a smaller queue size in RabbitMQ we follow best practices" — which canonicalizes a production instance of RabbitMQ's queue-depth-as-performance-risk property driving architectural choices one layer upstream: shed at ingestion (concepts/load-shedding-at-ingestion) so RabbitMQ queues stay light. Also names RabbitMQ's back-pressure mechanism directly: "RabbitMQ is able to apply back-pressure when slow consumers are detected... RabbitMQ will slow down the publish rate which the publisher will experience in the increase in the publish time." See concepts/publish-latency-as-congestion-signal, patterns/aimd-ingestion-rate-control. ()

Stub — no deeper RabbitMQ-internals source yet ingested.

Last updated · 542 distilled / 1,571 read