PATTERN Cited by 1 source
Multi-agent coordination over streaming¶
Pattern¶
In multi-agent systems, agents need to communicate, hand off work, and synchronise. The streaming-broker pattern is to treat multi-agent coordination as a microservices-over-Kafka problem: each agent is a producer and/or consumer of events on streaming topics; coordination is via topic publication + subscription, not point-to-point RPC or shared state.
agent A ─publish─► topic(plan-ready) ◄─subscribe─ agent B
◄─subscribe─ agent C
agent B ─publish─► topic(result-b) ◄─subscribe─ agent D (aggregator)
agent C ─publish─► topic(result-c) ◄─subscribe─ agent D
This gives multi-agent systems the same benefits microservices got from Kafka a decade ago: decoupled services, durability, fan-in, and fan-out.
Canonical statement¶
From Tyler Akidau's 2026-02-10 Redpanda post (Source: sources/2026-02-10-redpanda-how-to-safely-deploy-agentic-ai-in-the-enterprise):
"Multi-agent coordination seems like another classic streaming use case. If you think about the microservices architecture, you get benefits like decoupled services, durability, and fan-in and fan-out inputs. Multi-agent scenarios also require scalable, decoupled communication. With streaming, you get easier maintenance and better durability for your multi-agent system."
Four properties the streaming broker delivers¶
- Decoupled services. Agent A publishes to a topic; it doesn't know (and doesn't need to know) which agents subscribe. Agent B subscribes to the topic; it doesn't know who the producers are. Adding a new agent is a new subscriber, not a producer-side rewrite.
- Durability. If agent B is offline when A publishes, the event is retained on the topic. B catches up when it restarts. This is load-bearing for long-running multi-agent workflows where individual agent sessions are shorter than the workflow.
- Fan-out. A single producer can feed N consumers. One "plan-ready" event from a coordinator agent can trigger N worker agents in parallel.
- Fan-in. N producers can feed one aggregator. M worker agents can each emit their partial result; one aggregator consumer subscribes to the common topic and assembles the final answer.
Why decoupled > RPC for multi-agent¶
Synchronous RPC between agents is the naïve shape: agent A calls agent B directly; A blocks until B responds. Problems:
- Tight coupling of availability. A can't make progress if B is down, even if A's work is otherwise ready to proceed.
- No durability on crash. If A crashes mid-call to B, the work is lost; no shared substrate to resume from.
- Scatter-gather complexity. Coordinating M-way fan-out + fan-in requires A to track every outstanding call; any partial failure needs bespoke retry logic.
- Back-pressure awkwardness. A has to implement rate-limiting against B; every new A needs its own rate-limiter.
- Observability scattered. Each A→B RPC is its own trace; cross-agent causality lives in distributed-tracing system.
Topic-mediated coordination retires all five problems. The broker is the durability boundary; consumer-group semantics handle fan-out back-pressure; the topic log is the unified observability surface.
Composition with the audit envelope¶
The multi-agent coordination topics naturally compose with the audit envelope:
- Every agent's published event is captured for audit by construction — it's already on a durable, queryable log.
- Cross-agent workflows can be replayed end-to-end against the topic sequence.
- Lineage is explicit: each event's
headers/ schema carries the producing agent ID, the task ID, the upstream trigger.
This is the composition advantage of one substrate, many views — the same topic that carries coordination traffic also carries the audit trail.
Structural sibling: microservices over Kafka¶
The pattern is structurally identical to the microservices-over-Kafka shape that the streaming community already canonicalised:
| Microservices | Multi-agent |
|---|---|
| Service A | Agent A |
| Service B | Agent B |
| Event-driven communication | Event-driven communication |
| Kafka topic | Kafka topic |
| Consumer groups | Consumer groups (with each agent type as a group) |
| Schema registry | Schema registry (for agent event shapes) |
| Back-pressure via consumer lag | Back-pressure via consumer lag |
The implication: multi-agent systems don't need a new coordination substrate. The streaming-broker primitives that power thousands of microservices deployments today also power multi-agent agent deployments, at the cost of framing shift from "services" to "agents".
Trade-offs¶
Wins - Proven substrate (Kafka, Redpanda, etc.) with known operational shape. - Durable, queryable coordination record (audit + replay for free). - Fan-out + fan-in primitives native. - Agents can crash and restart without losing workflow state. - Decoupled scaling: coordinator agents, worker agents, aggregator agents all scale independently.
Costs - Async-first mental model. Teams used to synchronous RPC have to rewire to event-driven thinking. Error handling is different (retry + dead-letter vs. exception). - Event schema management. Agents need a shared schema for coordination events; schema drift across agents is the new cross-service-API-break. - Ordering semantics. Multi-agent workflows with strict ordering requirements (e.g. approval chains) need per-key partitioning + single-consumer-per-partition discipline to preserve order. - Exactly-once across agents is hard. If an agent is non-idempotent (e.g. makes an external API call with side-effects), the broker's exactly-once-delivery semantics don't help; each agent needs its own idempotency key / deduplication. - Latency floor. Async coordination adds broker-hop latency (milliseconds). For use cases requiring sub-millisecond coordination (rare at the agent altitude), direct RPC may still be preferable.
Related shapes on the wiki¶
- concepts/agentic-ai-infrastructure-challenges — axis 8 (multi-agent coordination) of Akidau's eight-axis checklist.
- concepts/streaming-as-agile-data-platform-backbone — the structural framing this pattern instantiates at the agent altitude.
- concepts/durable-execution — sibling coordination substrate (Temporal-style) for single-workflow durability; multi-agent streaming coordination is the distributed-workflow across agents dual.
- patterns/durable-event-log-as-agent-audit-envelope — the audit / replay / lineage shape this pattern composes with by construction.
- patterns/cdc-fanout-single-stream-to-many-consumers — data-fan-out analogue; multi-agent coordination is control-fan-out on the same substrate.
- patterns/mcp-as-centralized-integration-proxy — the tool-side peer of multi-agent coordination; MCP proxies connect agents to external systems, multi-agent coordination connects agents to each other.
Seen in¶
- sources/2026-02-10-redpanda-how-to-safely-deploy-agentic-ai-in-the-enterprise — canonical wiki introduction. Akidau names multi-agent coordination as axis 8 of his eight-axis enterprise-agent- infrastructure checklist, framing it as "another classic streaming use case" that inherits the decoupled-services + durability + fan-in + fan-out benefits from the microservices-over-Kafka lineage.
Related¶
- concepts/agentic-ai-infrastructure-challenges
- concepts/streaming-as-agile-data-platform-backbone
- concepts/autonomy-enterprise-agents
- concepts/log-as-truth-database-as-cache
- concepts/durable-execution
- patterns/cdc-fanout-single-stream-to-many-consumers
- patterns/durable-event-log-as-agent-audit-envelope
- patterns/mcp-as-centralized-integration-proxy
- systems/redpanda
- systems/redpanda-agentic-data-plane