Skip to content

PATTERN Cited by 6 sources

Specialized agent decomposition

Build per-domain agents (storage, databases, client-side traffic, network, …) that each carry a small, well-scoped toolset, and let them collaborate on an end-to-end analysis — rather than building one mega-agent that carries every tool and context for every domain.

Intent

A single general-purpose agent suffers two failure modes as it grows:

  • Tool-selection noise. Large tool inventories make the LLM more likely to pick wrong / less-optimal tools.
  • Context crowding. Packing domain-specific system prompts into one context dilutes each domain's instructions and hits context-window limits.

Decomposition puts each domain's tools and prompts in a dedicated agent whose reasoning space is small, then composes their outputs for cross-domain investigations.

When to reach for it

  • You already have a patterns/tool-decoupled-agent-framework: adding an agent ≈ adding a configuration, not a new codebase.
  • Debugging / investigation spans multiple subsystems (e.g., DB + client traffic + storage).
  • You observe tool-selection errors correlated with tool-inventory growth.

Mechanism

  1. Carve along coherent domains. Each agent owns a specific scope: one system-and-database agent, one client-traffic agent, etc. Tools within an agent are cohesive.
  2. Shared infrastructure. Framework (LLM client, conversation state, tool-call parser, snapshot/replay harness) lives once; each agent instantiates it.
  3. Collaboration protocol. Either an orchestrator agent routes questions to specialists and merges outputs, or specialists hand off to each other via well-defined events. Databricks' post describes collaboration but doesn't spec the protocol.
  4. Per-agent evaluation. Each agent has its own snapshot-replay corpus (see patterns/snapshot-replay-agent-evaluation); specialization makes eval more tractable, not less.

Why it helps

  • Deep expertise per agent. Smaller tool inventory + focused prompt + focused eval corpus = better domain accuracy.
  • Parallel team development. Different teams can own different specialist agents.
  • Incremental rollout. New domains get their own agent without destabilizing existing ones.
  • Extensibility beyond original scope. Once a few agents exist, adding one for a new system (say, caching, or Kubernetes) is a well-defined template.

Tradeoffs

  • Orchestration overhead. Cross-domain questions now require coordination — "this is a DB issue triggered by a client-side surge" requires both agents. Poorly designed coordination layers regress latency and UX.
  • Consistency. Multiple specialists can return overlapping or contradictory diagnoses. Need a reconciliation step or primary-agent mechanic.
  • Boundary drift. A signal that looks like a DB issue may actually live in client traffic; agents must know when to hand off.

Seen in

  • sources/2025-12-03-databricks-ai-agent-debug-databases — Databricks' systems/storex enables "specialized agents for different domains: one focused on system and database issues, another on client-side traffic patterns, and so on. This decomposition enables each agent to build deep expertise in its area while collaborating with others to deliver a more complete root cause analysis. It also paves the way for integrating AI agents into other parts of our infrastructure, extending beyond databases."

  • sources/2026-04-20-cloudflare-orchestrating-ai-code-review-at-scale — Cloudflare's AI Code Review system is the canonical wiki instance of the pattern applied to code review. Seven specialised sub-reviewers (security, performance, code quality, documentation, release, AGENTS.md, engineering- codex) run in parallel, each with a tightly scoped prompt and an explicit "What NOT to flag" section. Coordinated by a judge-pass coordinator on the top model tier. See the specialisation-dedicated pattern patterns/specialized-reviewer-agents and the orchestration shape patterns/coordinator-sub-reviewer-orchestration. Production scale (first 30 days): 131,246 runs across 5,169 repos; 85.7% prompt-cache hit rate; ~1.2 findings per review.

  • sources/2025-11-17-dropbox-how-dash-uses-context-engineering-for-smarter-ai — Dropbox Dash extracts query construction for its universal search tool into a dedicated search sub-agent. The main planning agent decides when to search; the sub-agent owns the how (user-intent → index-field mapping, query rewriting for semantic matching, typos / synonyms / implicit context). Named rationale: "When a tool demands too much explanation or context to be used effectively, it's often better to turn it into a dedicated agent with a focused prompt." This is the pattern applied to sub-tool complexity, not just domain separation — same shape, different motivation from Storex.

  • sources/2026-01-28-dropbox-knowledge-graphs-mcp-dspy-dash — Josh Clemm's companion talk extends the Dash decomposition with an additional mechanism: a classifier picks the sub-agent for complex agentic queries, each sub-agent having a much narrower tool set. "We use a lot of sub-agents for very complex agentic queries, and have a classifier effectively pick the sub-agent with a much more narrow set of tools." This adds a named routing mechanism to the pattern (the classifier) that the 2025-11-17 post didn't explicitly describe; it also positions specialized-agent-decomposition as one of four named fixes Dash applied to make MCP work at scale (alongside patterns/unified-retrieval-tool, knowledge-graph-bundle token compression, and tool-result-local-storage).

Two framings of the same pattern

  • Domain-based decomposition (Storex). One agent per domain (DB, client traffic, storage, network); composition layer routes cross-domain questions. Intent: scale tool inventory + prompt specialization across many areas of expertise.
  • Sub-tool decomposition (Dash). Extract one specific tool's own internal complexity into a sub-agent, because the tool's explanation otherwise starves the parent's context budget. Intent: protect context budget when a single tool's instruction weight grows.

Both converge on the same mechanism (dedicated prompt + dedicated tool surface + orchestration hand-off) for different reasons. A mature production system often does both — per-domain agents plus, within each, sub-agents for the most complex sub-tasks.

AWS reference-architecture shape (2025-12-11)

AWS's conversational-observability Strands deployment adopts this pattern with a three-agent split over Kubernetes troubleshooting:

  • Agent Orchestrator — coordinates the troubleshooting workflow across the other two agents. "Coordinates troubleshooting workflows."
  • Memory Agent — owns conversation context and historical insights across turns / sessions. "Manages conversation context and historical insights."
  • K8s Specialist — narrow-surface diagnostic agent calling EKS MCP Server tools. "Handles Kubernetes diagnostics."

The decomposition mirrors Storex (per-storage-layer specialists), Dash (classifier-routed sub-agents), and Cloudflare Agent Lee (domain-per-team agents): same pattern, same rationale — keeping each agent's tool inventory small enough for reliable selection and small enough to fit in context. Operational-ops instance of the pattern, same shape as Storex's storage-incident instance. (Source: sources/2025-12-11-aws-architecting-conversational-observability-for-cloud-applications)

Verification-gated inner-loop variant (DS-STAR, 2025-11-06)

Google Research's DS-STAR data-science agent is the canonical wiki instance of specialised-agent decomposition organised by role in a refinement loop, not by subject-matter domain:

  • Data File Analyzer — writes + runs a file-summarisation script.
  • Planner — emits the high-level plan.
  • Coder — turns the plan into executable code, runs it.
  • VerifierLLM judge scoring plan sufficiency against intermediate results.
  • Router — on reject, decides add-step vs fix-step.

The agents specialise not because they own different domains but because they own different roles in the plan → implement → verify → refine loop; the Verifier gates each cycle, and the Router's add-or-fix decision is the refinement primitive. Full pattern is patterns/planner-coder-verifier-router-loop; loop-level concept is concepts/iterative-plan-refinement.

Ablations quantify the decomposition's value: removing the Data File Analyzer collapses DABStep hard-task accuracy 45.2 % → 26.98 %; removing the Router (forcing extend-only) degrades both easy and hard tasks — "it is more effective to correct mistakes in a plan than to keep adding potentially flawed steps" (Source: sources/2025-11-06-google-ds-star-versatile-data-science-agent).

Adds a third framing to this pattern's taxonomy alongside the Storex (domain-based) and Dash (sub-tool) framings:

  • Role-in-the-refinement-loop decomposition (DS-STAR). One agent per loop role (context, plan, implement, judge, route); coordination is the loop itself. Intent: make verification and revision first-class, isolate ablation-testable primitives.
Last updated · 200 distilled / 1,178 read