PATTERN Cited by 1 source
Lightweight aggregator in front of broker¶
Pattern¶
When the application's required batching semantics are not expressible in a general-purpose message broker's native batching knobs, place a small, stateless (or near-stateless) aggregator process between the broker and the downstream workers. The aggregator consumes from the broker using its native semantics, runs application-specific batching logic internally, and dispatches correctly-shaped batches to workers (model servers, stream processors, storage backends). The broker keeps its durability / fan-out / delivery guarantees; the aggregator provides the batching discipline the broker lacks (Source: sources/2025-12-18-mongodb-token-count-based-batching-faster-cheaper-embedding-inference).
Motivation¶
Brokers specialise in a small, useful set of batching primitives:
- Kafka — byte-count + message-count + time-window batching at partition granularity on the producer side; consumer-side batching by messages / bytes.
- RabbitMQ — request-count prefetch window, push delivery.
These primitives are great for message-transport economics (saturate TCP, amortise broker bookkeeping) but not for compute- bound application batching. Specifically, they can't express:
- "Batch by a summed attribute of the payload" (token count, model size, priority weight, memory footprint).
- "Atomically claim a prefix up to budget X", where X is a caller-supplied number derived from downstream hardware.
- "Peek + conditional pop" — consumer-driven cost evaluation across several pending items.
The cleanest general-purpose solution: don't change the broker; don't change the workers; insert an aggregator.
Voyage AI names the two paths explicitly:
"There are two practical paths to make token-count-based batching work. One is to place a lightweight aggregator in front of Kafka/RabbitMQ that consumes batches by token counts and then dispatches batches to model servers. The other is to use a store that naturally supports fast peek + conditional batching — for example, Redis with Lua script."
Voyage AI took the second path (Redis + Lua — see patterns/atomic-conditional-batch-claim); this pattern documents the first.
Architecture shape¶
Producer ──▶ Broker (Kafka / RabbitMQ / SQS) ──▶ Aggregator ──▶ Workers
durable, fanout, delivery app-specific (model server,
semantics batching storage, …)
logic
- Aggregator consumes from the broker at the broker's native pace (poll-based or push).
- Aggregator maintains a short-lived in-memory batch buffer keyed to the application budget (token count, byte size, priority weight).
- Aggregator dispatches batches to workers when budget is reached, time window expires, or worker pulls (depending on worker-side semantics).
- Aggregator acknowledges broker messages only after downstream acknowledges the dispatched batch — preserves at-least-once semantics.
Implementation is typically a few-hundred-line service in the application's language: a consumer loop, a buffer, a dispatch trigger, a retry / DLQ path.
When this is the right shape (vs a native-primitive store)¶
Pick the aggregator variant when:
- The broker is already in the stack for other reasons (existing Kafka cluster, existing RabbitMQ queue topology, cross-service event fabric) — don't add a second queueing substrate.
- Broker-native durability, fan-out, consumer-group, DLQ semantics are needed — losing them is a correctness loss.
- The batching logic is application-specific and evolving — writing it in the aggregator keeps iteration fast without broker-side config changes.
- Delivery semantics must cross multiple downstream systems — broker's subscriber model + aggregator per downstream gives isolation.
Pick atomic conditional batch claim in a native store (Redis + Lua, or a purpose-built queue) when:
- You're starting from scratch and the workload doesn't need broker-grade durability / fan-out.
- Latency budget is tight — aggregator adds a hop and a potential queueing stage.
- Batching logic is stable and expressible in a short atomic script.
Trade-offs¶
- Extra tier. One more service to deploy, monitor, secure, and scale. Usually small but not free.
- Latency budget. Adds one network hop + aggregator-internal queueing delay — typically single-digit ms.
- State carries. Even "lightweight" aggregators buffer in memory; loss of an aggregator instance loses any un-acked batch state. Broker's delivery guarantee re-delivers after ack timeout; duplicate dispatch possible at the edge → workers must be idempotent.
- Scaling discipline. Aggregator pool must scale with broker partition count or subscriber group to avoid a single aggregator becoming the bottleneck. Common approach: one aggregator per consumer group, partitioned over the broker's natural shard key.
- Observability doubles. Now two tiers to instrument — broker backlog and aggregator batch composition.
Seen in¶
- 2025-12-18 Voyage AI / MongoDB — Token-count-based batching — post explicitly articulates this pattern as one of two practical paths for making token-count batching work on top of Kafka / RabbitMQ, alongside the Redis + Lua native store alternative Voyage AI ultimately picked (sources/2025-12-18-mongodb-token-count-based-batching-faster-cheaper-embedding-inference).