PATTERN Cited by 1 source

Parallel sub-agent execution for latency¶

Pattern¶

When a multi-agent orchestration breaks a task into N sub-agent calls that are data-independent, invoke them in parallel rather than sequentially. Case-level latency becomes max(sub-agent-latency) instead of sum(sub-agent-latency) — the single biggest latency win available in multi-sub-agent architectures.

Canonical framing, IBM + AWS KYC architecture (2026-04-23): "The parallel agent execution model is designed to reduce KYC validation time from the typical 3-5 days to near-real time for standard cases. This approach enables exponentially faster processing through simultaneous operation of Document Analysis, Identity Verification, and Fraud Detection agents rather than sequential workflows." (Source: sources/2026-04-23-aws-modernizing-kyc-with-aws-serverless-solutions-and-agentic-ai.)

Sequential vs parallel in concrete shape¶

SEQUENTIAL (wall-clock time = A + B + C):
[ DocAnalysis ────][ IdentityVerify ──────][ FraudDetect ────────]

PARALLEL (wall-clock time = max(A, B, C)):
[ DocAnalysis ────]
[ IdentityVerify ──────]
[ FraudDetect ────────]
                          ▼
                    Supervisor composes

With the three KYC sub-agents averaging say 2s, 3s, 4s respectively: - Sequential: 9s - Parallel: 4s

The advantage grows with sub-agent count. Five parallel sub-agents at ~5s each = 5s wall-clock instead of 25s.

Preconditions¶

Data independence is the hard requirement:

Sub-agents must not need each other's outputs as input.
Tool calls must not depend on ordering (e.g. don't activate the account before identity is verified).
Each sub-agent's confidence score must be independently meaningful — the Supervisor composes them at the end.

Where preconditions fail: - Document Analysis → Identity Verification is not parallelisable if Identity needs the extracted passport number from Document Analysis. - Compliance & Risk may depend on all three earlier sub-agents completing, so sits sequentially after the parallel fan-out. - Customer Experience is observational and parallel-safe.

The KYC architecture picks Doc Analysis + Identity Verify + Fraud Detection as the parallelisable set specifically because they share inputs (customer profile + documents) but not outputs — each produces a confidence signal the Supervisor can combine.

Supervisor as the parallelisation planner¶

The Supervisor owns the parallelisation decision:

"The supervisor analyses case characteristics (document types, customer geography, risk indicators, and historical patterns) to construct context-aware execution plans that invoke sub-agents in parallel or sequentially based on dependencies."

Not every case parallelises the same way. A high-risk case might run Fraud Detection first sequentially and short-circuit on detection, then only spawn other sub-agents if Fraud Detection clears. A low-risk case runs everything in parallel. The Supervisor's dynamic planning (patterns/supervisor-subagent-kyc-orchestration) is what makes this choice per-case.

Latency composition hazards¶

Tail-latency amplification. max(A, B, C) is sensitive to the tail of each sub-agent — a p99 outage on Fraud Detection becomes the case's p99. Mitigations:

Per-sub-agent timeout budgets (if a sub-agent exceeds its budget, the Supervisor proceeds with partial confidence and a lower confidence tier).
Hedged tool calls inside individual sub-agents (redundant invoke after short delay). See concepts/tail-latency-at-scale.
Sub-agent-level caching where idempotent (e.g. pre-cached watchlist results for the same identity).

Resource contention. Parallel sub-agents sharing a common rate-limited downstream (same vendor API, same model endpoint) serialise behind the rate limit. Either fan out to distinct endpoints or let the rate limiter + Supervisor downgrade gracefully.

Composition rule dominance. If the Supervisor uses min-confidence across sub-agents, a single slow-confidence sub-agent dominates. If it uses weighted average, it's more forgiving. Composition rule has to be aligned with regulatory risk appetite — documented, not just coded.

When not to parallelise¶

Sub-agents have genuine data dependencies: parallelising creates wrong-order reads.
Regulatory sequence requirements: some jurisdictions require sequential gating (e.g. sanctions screening must complete before any other KYC step). Parallel → non-compliant.
Side-effect risk: sub-agents that write to the same state should not run concurrently without explicit conflict handling.
Cost vs latency: parallel fan-out costs N× compute for potentially small latency wins when sub-agents are already fast.

patterns/two-loop-parallel-async-build — Yelp CHAOS's client-side version of the same idea: loop 1 starts all async calls, loop 2 awaits them; sync-looking API.
patterns/multi-agent-streaming-coordination — pairs well because Kafka fan-out naturally supports parallel subscription-based dispatch.
patterns/supervisor-subagent-kyc-orchestration — this pattern is the latency mechanism inside that orchestration.

Seen in¶

sources/2026-04-23-aws-modernizing-kyc-with-aws-serverless-solutions-and-agentic-ai — three parallel sub-agents (Doc Analysis + Identity Verify + Fraud Detect) as the core mechanism behind 3-5-day → sub-5-minute KYC latency target.