Skip to content

PATTERN Cited by 1 source

Lightweight aggregator in front of broker

Pattern

When the application's required batching semantics are not expressible in a general-purpose message broker's native batching knobs, place a small, stateless (or near-stateless) aggregator process between the broker and the downstream workers. The aggregator consumes from the broker using its native semantics, runs application-specific batching logic internally, and dispatches correctly-shaped batches to workers (model servers, stream processors, storage backends). The broker keeps its durability / fan-out / delivery guarantees; the aggregator provides the batching discipline the broker lacks (Source: sources/2025-12-18-mongodb-token-count-based-batching-faster-cheaper-embedding-inference).

Motivation

Brokers specialise in a small, useful set of batching primitives:

  • Kafka — byte-count + message-count + time-window batching at partition granularity on the producer side; consumer-side batching by messages / bytes.
  • RabbitMQ — request-count prefetch window, push delivery.

These primitives are great for message-transport economics (saturate TCP, amortise broker bookkeeping) but not for compute- bound application batching. Specifically, they can't express:

  • "Batch by a summed attribute of the payload" (token count, model size, priority weight, memory footprint).
  • "Atomically claim a prefix up to budget X", where X is a caller-supplied number derived from downstream hardware.
  • "Peek + conditional pop" — consumer-driven cost evaluation across several pending items.

The cleanest general-purpose solution: don't change the broker; don't change the workers; insert an aggregator.

Voyage AI names the two paths explicitly:

"There are two practical paths to make token-count-based batching work. One is to place a lightweight aggregator in front of Kafka/RabbitMQ that consumes batches by token counts and then dispatches batches to model servers. The other is to use a store that naturally supports fast peek + conditional batching — for example, Redis with Lua script."

Voyage AI took the second path (Redis + Lua — see patterns/atomic-conditional-batch-claim); this pattern documents the first.

Architecture shape

Producer ──▶ Broker (Kafka / RabbitMQ / SQS) ──▶ Aggregator ──▶ Workers
             durable, fanout, delivery            app-specific     (model server,
             semantics                            batching         storage, …)
                                                  logic
  • Aggregator consumes from the broker at the broker's native pace (poll-based or push).
  • Aggregator maintains a short-lived in-memory batch buffer keyed to the application budget (token count, byte size, priority weight).
  • Aggregator dispatches batches to workers when budget is reached, time window expires, or worker pulls (depending on worker-side semantics).
  • Aggregator acknowledges broker messages only after downstream acknowledges the dispatched batch — preserves at-least-once semantics.

Implementation is typically a few-hundred-line service in the application's language: a consumer loop, a buffer, a dispatch trigger, a retry / DLQ path.

When this is the right shape (vs a native-primitive store)

Pick the aggregator variant when:

  • The broker is already in the stack for other reasons (existing Kafka cluster, existing RabbitMQ queue topology, cross-service event fabric) — don't add a second queueing substrate.
  • Broker-native durability, fan-out, consumer-group, DLQ semantics are needed — losing them is a correctness loss.
  • The batching logic is application-specific and evolving — writing it in the aggregator keeps iteration fast without broker-side config changes.
  • Delivery semantics must cross multiple downstream systems — broker's subscriber model + aggregator per downstream gives isolation.

Pick atomic conditional batch claim in a native store (Redis + Lua, or a purpose-built queue) when:

  • You're starting from scratch and the workload doesn't need broker-grade durability / fan-out.
  • Latency budget is tight — aggregator adds a hop and a potential queueing stage.
  • Batching logic is stable and expressible in a short atomic script.

Trade-offs

  • Extra tier. One more service to deploy, monitor, secure, and scale. Usually small but not free.
  • Latency budget. Adds one network hop + aggregator-internal queueing delay — typically single-digit ms.
  • State carries. Even "lightweight" aggregators buffer in memory; loss of an aggregator instance loses any un-acked batch state. Broker's delivery guarantee re-delivers after ack timeout; duplicate dispatch possible at the edge → workers must be idempotent.
  • Scaling discipline. Aggregator pool must scale with broker partition count or subscriber group to avoid a single aggregator becoming the bottleneck. Common approach: one aggregator per consumer group, partitioned over the broker's natural shard key.
  • Observability doubles. Now two tiers to instrument — broker backlog and aggregator batch composition.

Seen in

Last updated · 200 distilled / 1,178 read