Skip to content

PATTERN Cited by 1 source

Multi-agent coordination over streaming

Pattern

In multi-agent systems, agents need to communicate, hand off work, and synchronise. The streaming-broker pattern is to treat multi-agent coordination as a microservices-over-Kafka problem: each agent is a producer and/or consumer of events on streaming topics; coordination is via topic publication + subscription, not point-to-point RPC or shared state.

agent A ─publish─► topic(plan-ready) ◄─subscribe─ agent B
                                     ◄─subscribe─ agent C

agent B ─publish─► topic(result-b)   ◄─subscribe─ agent D (aggregator)
agent C ─publish─► topic(result-c)   ◄─subscribe─ agent D

This gives multi-agent systems the same benefits microservices got from Kafka a decade ago: decoupled services, durability, fan-in, and fan-out.

Canonical statement

From Tyler Akidau's 2026-02-10 Redpanda post (Source: sources/2026-02-10-redpanda-how-to-safely-deploy-agentic-ai-in-the-enterprise):

"Multi-agent coordination seems like another classic streaming use case. If you think about the microservices architecture, you get benefits like decoupled services, durability, and fan-in and fan-out inputs. Multi-agent scenarios also require scalable, decoupled communication. With streaming, you get easier maintenance and better durability for your multi-agent system."

Four properties the streaming broker delivers

  1. Decoupled services. Agent A publishes to a topic; it doesn't know (and doesn't need to know) which agents subscribe. Agent B subscribes to the topic; it doesn't know who the producers are. Adding a new agent is a new subscriber, not a producer-side rewrite.
  2. Durability. If agent B is offline when A publishes, the event is retained on the topic. B catches up when it restarts. This is load-bearing for long-running multi-agent workflows where individual agent sessions are shorter than the workflow.
  3. Fan-out. A single producer can feed N consumers. One "plan-ready" event from a coordinator agent can trigger N worker agents in parallel.
  4. Fan-in. N producers can feed one aggregator. M worker agents can each emit their partial result; one aggregator consumer subscribes to the common topic and assembles the final answer.

Why decoupled > RPC for multi-agent

Synchronous RPC between agents is the naïve shape: agent A calls agent B directly; A blocks until B responds. Problems:

  • Tight coupling of availability. A can't make progress if B is down, even if A's work is otherwise ready to proceed.
  • No durability on crash. If A crashes mid-call to B, the work is lost; no shared substrate to resume from.
  • Scatter-gather complexity. Coordinating M-way fan-out + fan-in requires A to track every outstanding call; any partial failure needs bespoke retry logic.
  • Back-pressure awkwardness. A has to implement rate-limiting against B; every new A needs its own rate-limiter.
  • Observability scattered. Each A→B RPC is its own trace; cross-agent causality lives in distributed-tracing system.

Topic-mediated coordination retires all five problems. The broker is the durability boundary; consumer-group semantics handle fan-out back-pressure; the topic log is the unified observability surface.

Composition with the audit envelope

The multi-agent coordination topics naturally compose with the audit envelope:

  • Every agent's published event is captured for audit by construction — it's already on a durable, queryable log.
  • Cross-agent workflows can be replayed end-to-end against the topic sequence.
  • Lineage is explicit: each event's headers / schema carries the producing agent ID, the task ID, the upstream trigger.

This is the composition advantage of one substrate, many views — the same topic that carries coordination traffic also carries the audit trail.

Structural sibling: microservices over Kafka

The pattern is structurally identical to the microservices-over-Kafka shape that the streaming community already canonicalised:

Microservices Multi-agent
Service A Agent A
Service B Agent B
Event-driven communication Event-driven communication
Kafka topic Kafka topic
Consumer groups Consumer groups (with each agent type as a group)
Schema registry Schema registry (for agent event shapes)
Back-pressure via consumer lag Back-pressure via consumer lag

The implication: multi-agent systems don't need a new coordination substrate. The streaming-broker primitives that power thousands of microservices deployments today also power multi-agent agent deployments, at the cost of framing shift from "services" to "agents".

Trade-offs

Wins - Proven substrate (Kafka, Redpanda, etc.) with known operational shape. - Durable, queryable coordination record (audit + replay for free). - Fan-out + fan-in primitives native. - Agents can crash and restart without losing workflow state. - Decoupled scaling: coordinator agents, worker agents, aggregator agents all scale independently.

Costs - Async-first mental model. Teams used to synchronous RPC have to rewire to event-driven thinking. Error handling is different (retry + dead-letter vs. exception). - Event schema management. Agents need a shared schema for coordination events; schema drift across agents is the new cross-service-API-break. - Ordering semantics. Multi-agent workflows with strict ordering requirements (e.g. approval chains) need per-key partitioning + single-consumer-per-partition discipline to preserve order. - Exactly-once across agents is hard. If an agent is non-idempotent (e.g. makes an external API call with side-effects), the broker's exactly-once-delivery semantics don't help; each agent needs its own idempotency key / deduplication. - Latency floor. Async coordination adds broker-hop latency (milliseconds). For use cases requiring sub-millisecond coordination (rare at the agent altitude), direct RPC may still be preferable.

Seen in

  • sources/2026-02-10-redpanda-how-to-safely-deploy-agentic-ai-in-the-enterprise — canonical wiki introduction. Akidau names multi-agent coordination as axis 8 of his eight-axis enterprise-agent- infrastructure checklist, framing it as "another classic streaming use case" that inherits the decoupled-services + durability + fan-in + fan-out benefits from the microservices-over-Kafka lineage.
Last updated · 470 distilled / 1,213 read