Skip to content

PATTERN Cited by 1 source

Context-segregated sub-agents

Pattern

When an agent needs to do work that (a) would consume large amounts of the main context window, (b) needs a different tool surface than the parent, or (c) is exposed to untrusted input that should not enter the planner's context, spawn a sub-agent with its own fresh context array and its own narrow tool allowlist. The parent invokes the sub-agent as a single call, receives a summary (or structured result), and appends only that summary to the parent's context.

Fly.io's canonical formulation — also the simplest possible demystification of "Claude Code sub-agents":

"Take 'sub-agents'. People make a huge deal out of Claude Code's sub-agents, but you can see now how trivial they are to implement: just a new context array, another call to the model. Give each call different tools. Make sub-agents talk to each other, summarize each other, collate and aggregate." (Source: sources/2025-11-06-flyio-you-should-write-an-agent.)

The pattern rests on the fact that a "conversation" is just a Python list of strings (concepts/agent-loop-stateless-llm) — a sub-agent is a second list with a second LLM call against it.

What segregation buys

  • Token-budget isolation. Exploratory tool calls, dead-ends, and large intermediate outputs stay inside the sub-agent's context. The parent's token budget only spends on the sub-agent's returned summary.
  • Tool-surface narrowing. Each sub-agent sees only the tools relevant to its concern — a security-review sub-agent gets read-only code-analysis tools; a deployment sub-agent gets write-capable deploy tools. Per-turn schema overhead shrinks and tool-selection accuracy improves.
  • Security segregation. Fly.io: "You can trivially build an agent with segregated contexts, each with specific tools. That makes LLM security interesting." Hostile or untrusted input (web-search results, user-uploaded files, email bodies, third-party API output) goes into the sub-agent's context where a successful prompt-injection cannot rewrite the parent's plan. Complementary to patterns/untrusted-input-via-file-not-prompt.
  • Parallelism. Multiple sub-agents can run concurrently over disjoint inputs; results feed a collator sub-agent or the parent.
  • Specialisation. Each sub-agent gets its own system prompt, its own model choice (cheap model for classification, expensive model for synthesis), its own temperature.

Concrete shapes

  • Fan-out / collate. Parent spawns N sub-agents over partitioned input, collects their summaries, passes them to a collator sub-agent that aggregates.
  • Planner + executor. Parent plans and dispatches; one or more executor sub-agents do the narrow task with write-capable tools. See patterns/specialized-agent-decomposition.
  • Coordinator + specialists. Cloudflare AI Code Review canonical instance (patterns/coordinator-sub-reviewer-orchestration): coordinator sub-agent spawns one specialised-reviewer sub-agent per review category, each with its own prompt + tool allowlist, coordinator aggregates findings.
  • Summarise-then-forget. Sub-agent reads a large corpus (long doc, log file, big search result), returns a summary; the raw corpus never enters any context.
  • Dual personalities. Two sub-agents with orthogonal prompts on the same input, compare outputs (Fly.io's Alph truth-teller / Ralph liar demo).

Implementation reminder (cheap)

From the minimal agent loop:

def sub_agent(task: str, tools_subset: list) -> str:
    ctx = [{"role": "system", "content": SUB_AGENT_PROMPT}]
    ctx.append({"role": "user", "content": task})
    # run the same tool-call loop against ctx + tools_subset
    return final_assistant_text

Fly.io's "your wackiest idea will probably (1) work and (2) take 30 minutes to code." The cost of experimentation is low; the correct move is to try the structure, not to architect it.

Caveats

  • The parent must still not trust sub-agent output blindly. A prompt-injected sub-agent can return a hostile summary. Treat sub-agent output as untrusted input (patterns/llm-output-as-untrusted-input).
  • Budget can be spent N× across sub-agents. Each sub-agent has its own context and each pays tokens for its system prompt and tool schemas. The segregation buys the parent budget discipline; fleet-level token cost is not automatically lower.
  • Choose the interchange format deliberately. Fly.io lists it as an open design problem: "what the most reliable intermediate forms are (JSON blobs? SQL databases? Markdown summaries) for interchange between them." Over-structured forms (strict JSON schemas) cost both ends tokens; under- structured forms (prose summaries) lose fidelity.
  • Early-exit-by-lying-to-self. A sub-agent can report "done" when it isn't. Ground-truth verifiers (tests green, compiler clean, independent judge agent) reduce this (concepts/agentic-development-loop).

Seen in

Last updated · 200 distilled / 1,178 read