PATTERN Cited by 1 source

Lightweight aggregator in front of broker¶

Pattern¶

When the application's required batching semantics are not expressible in a general-purpose message broker's native batching knobs, place a small, stateless (or near-stateless) aggregator process between the broker and the downstream workers. The aggregator consumes from the broker using its native semantics, runs application-specific batching logic internally, and dispatches correctly-shaped batches to workers (model servers, stream processors, storage backends). The broker keeps its durability / fan-out / delivery guarantees; the aggregator provides the batching discipline the broker lacks (Source: sources/2025-12-18-mongodb-token-count-based-batching-faster-cheaper-embedding-inference).

Motivation¶

Brokers specialise in a small, useful set of batching primitives:

Kafka — byte-count + message-count + time-window batching at partition granularity on the producer side; consumer-side batching by messages / bytes.
RabbitMQ — request-count prefetch window, push delivery.

These primitives are great for message-transport economics (saturate TCP, amortise broker bookkeeping) but not for compute- bound application batching. Specifically, they can't express:

"Batch by a summed attribute of the payload" (token count, model size, priority weight, memory footprint).
"Atomically claim a prefix up to budget X", where X is a caller-supplied number derived from downstream hardware.
"Peek + conditional pop" — consumer-driven cost evaluation across several pending items.

The cleanest general-purpose solution: don't change the broker; don't change the workers; insert an aggregator.

Voyage AI names the two paths explicitly:

"There are two practical paths to make token-count-based batching work. One is to place a lightweight aggregator in front of Kafka/RabbitMQ that consumes batches by token counts and then dispatches batches to model servers. The other is to use a store that naturally supports fast peek + conditional batching — for example, Redis with Lua script."

Voyage AI took the second path (Redis + Lua — see patterns/atomic-conditional-batch-claim); this pattern documents the first.

Architecture shape¶

Producer ──▶ Broker (Kafka / RabbitMQ / SQS) ──▶ Aggregator ──▶ Workers
             durable, fanout, delivery            app-specific     (model server,
             semantics                            batching         storage, …)
                                                  logic

Aggregator consumes from the broker at the broker's native pace (poll-based or push).
Aggregator maintains a short-lived in-memory batch buffer keyed to the application budget (token count, byte size, priority weight).
Aggregator dispatches batches to workers when budget is reached, time window expires, or worker pulls (depending on worker-side semantics).
Aggregator acknowledges broker messages only after downstream acknowledges the dispatched batch — preserves at-least-once semantics.

Implementation is typically a few-hundred-line service in the application's language: a consumer loop, a buffer, a dispatch trigger, a retry / DLQ path.

When this is the right shape (vs a native-primitive store)¶

Pick the aggregator variant when:

The broker is already in the stack for other reasons (existing Kafka cluster, existing RabbitMQ queue topology, cross-service event fabric) — don't add a second queueing substrate.
Broker-native durability, fan-out, consumer-group, DLQ semantics are needed — losing them is a correctness loss.
The batching logic is application-specific and evolving — writing it in the aggregator keeps iteration fast without broker-side config changes.
Delivery semantics must cross multiple downstream systems — broker's subscriber model + aggregator per downstream gives isolation.

Pick atomic conditional batch claim in a native store (Redis + Lua, or a purpose-built queue) when:

You're starting from scratch and the workload doesn't need broker-grade durability / fan-out.
Latency budget is tight — aggregator adds a hop and a potential queueing stage.
Batching logic is stable and expressible in a short atomic script.

Trade-offs¶

Extra tier. One more service to deploy, monitor, secure, and scale. Usually small but not free.
Latency budget. Adds one network hop + aggregator-internal queueing delay — typically single-digit ms.
State carries. Even "lightweight" aggregators buffer in memory; loss of an aggregator instance loses any un-acked batch state. Broker's delivery guarantee re-delivers after ack timeout; duplicate dispatch possible at the edge → workers must be idempotent.
Scaling discipline. Aggregator pool must scale with broker partition count or subscriber group to avoid a single aggregator becoming the bottleneck. Common approach: one aggregator per consumer group, partitioned over the broker's natural shard key.
Observability doubles. Now two tiers to instrument — broker backlog and aggregator batch composition.

Seen in¶

2025-12-18 Voyage AI / MongoDB — Token-count-based batching — post explicitly articulates this pattern as one of two practical paths for making token-count batching work on top of Kafka / RabbitMQ, alongside the Redis + Lua native store alternative Voyage AI ultimately picked (sources/2025-12-18-mongodb-token-count-based-batching-faster-cheaper-embedding-inference).