PATTERN Cited by 1 source
Context-segregated sub-agents¶
Pattern¶
When an agent needs to do work that (a) would consume large amounts of the main context window, (b) needs a different tool surface than the parent, or (c) is exposed to untrusted input that should not enter the planner's context, spawn a sub-agent with its own fresh context array and its own narrow tool allowlist. The parent invokes the sub-agent as a single call, receives a summary (or structured result), and appends only that summary to the parent's context.
Fly.io's canonical formulation — also the simplest possible demystification of "Claude Code sub-agents":
"Take 'sub-agents'. People make a huge deal out of Claude Code's sub-agents, but you can see now how trivial they are to implement: just a new context array, another call to the model. Give each call different tools. Make sub-agents talk to each other, summarize each other, collate and aggregate." (Source: sources/2025-11-06-flyio-you-should-write-an-agent.)
The pattern rests on the fact that a "conversation" is just a Python list of strings (concepts/agent-loop-stateless-llm) — a sub-agent is a second list with a second LLM call against it.
What segregation buys¶
- Token-budget isolation. Exploratory tool calls, dead-ends, and large intermediate outputs stay inside the sub-agent's context. The parent's token budget only spends on the sub-agent's returned summary.
- Tool-surface narrowing. Each sub-agent sees only the tools relevant to its concern — a security-review sub-agent gets read-only code-analysis tools; a deployment sub-agent gets write-capable deploy tools. Per-turn schema overhead shrinks and tool-selection accuracy improves.
- Security segregation. Fly.io: "You can trivially build an agent with segregated contexts, each with specific tools. That makes LLM security interesting." Hostile or untrusted input (web-search results, user-uploaded files, email bodies, third-party API output) goes into the sub-agent's context where a successful prompt-injection cannot rewrite the parent's plan. Complementary to patterns/untrusted-input-via-file-not-prompt.
- Parallelism. Multiple sub-agents can run concurrently over disjoint inputs; results feed a collator sub-agent or the parent.
- Specialisation. Each sub-agent gets its own system prompt, its own model choice (cheap model for classification, expensive model for synthesis), its own temperature.
Concrete shapes¶
- Fan-out / collate. Parent spawns N sub-agents over partitioned input, collects their summaries, passes them to a collator sub-agent that aggregates.
- Planner + executor. Parent plans and dispatches; one or more executor sub-agents do the narrow task with write-capable tools. See patterns/specialized-agent-decomposition.
- Coordinator + specialists. Cloudflare AI Code Review canonical instance (patterns/coordinator-sub-reviewer-orchestration): coordinator sub-agent spawns one specialised-reviewer sub-agent per review category, each with its own prompt + tool allowlist, coordinator aggregates findings.
- Summarise-then-forget. Sub-agent reads a large corpus (long doc, log file, big search result), returns a summary; the raw corpus never enters any context.
- Dual personalities. Two sub-agents with orthogonal prompts on the same input, compare outputs (Fly.io's Alph truth-teller / Ralph liar demo).
Implementation reminder (cheap)¶
From the minimal agent loop:
def sub_agent(task: str, tools_subset: list) -> str:
ctx = [{"role": "system", "content": SUB_AGENT_PROMPT}]
ctx.append({"role": "user", "content": task})
# run the same tool-call loop against ctx + tools_subset
return final_assistant_text
Fly.io's "your wackiest idea will probably (1) work and (2) take 30 minutes to code." The cost of experimentation is low; the correct move is to try the structure, not to architect it.
Caveats¶
- The parent must still not trust sub-agent output blindly. A prompt-injected sub-agent can return a hostile summary. Treat sub-agent output as untrusted input (patterns/llm-output-as-untrusted-input).
- Budget can be spent N× across sub-agents. Each sub-agent has its own context and each pays tokens for its system prompt and tool schemas. The segregation buys the parent budget discipline; fleet-level token cost is not automatically lower.
- Choose the interchange format deliberately. Fly.io lists it as an open design problem: "what the most reliable intermediate forms are (JSON blobs? SQL databases? Markdown summaries) for interchange between them." Over-structured forms (strict JSON schemas) cost both ends tokens; under- structured forms (prose summaries) lose fidelity.
- Early-exit-by-lying-to-self. A sub-agent can report "done" when it isn't. Ground-truth verifiers (tests green, compiler clean, independent judge agent) reduce this (concepts/agentic-development-loop).
Seen in¶
- Fly.io, You Should Write An Agent (2025-11-06) — canonical demystification of sub-agents as "just a new context array, another call to the model" + the four named levers (new context, new tools, summarise, collate). (Source: sources/2025-11-06-flyio-you-should-write-an-agent.)
- Cloudflare AI Code Review (2026-04-20) — coordinator + N specialised-reviewer sub-agents (patterns/coordinator-sub-reviewer-orchestration + patterns/specialized-reviewer-agents).
- Claude Code / Goose sub-agents — the IDE-side productised form Fly.io points at as the anchor for the pattern name (systems/claude-code).
Related¶
- concepts/sub-agent
- concepts/agent-loop-stateless-llm
- concepts/context-window-as-token-budget
- concepts/context-engineering
- concepts/prompt-injection
- patterns/tool-call-loop-minimal-agent
- patterns/specialized-agent-decomposition
- patterns/coordinator-sub-reviewer-orchestration
- patterns/specialized-reviewer-agents
- patterns/tool-surface-minimization
- patterns/untrusted-input-via-file-not-prompt
- systems/claude-code