Skip to content

CONCEPT Cited by 1 source

Shared-context fan-out

Shared-context fan-out is the pattern of writing large common context (merge-request metadata, previous review findings, diff summaries) to a shared file on disk and pointing multiple concurrent sub-agents at it via their file-read tool, rather than embedding the context in every sub-agent's prompt. The shared file is read once per agent via a tool call; the prompt stays small.

The token-cost problem it solves

A naïve N-sub-agent fan-out embeds the full MR context in each sub-agent's prompt:

agent 1 prompt: [base] + [agent 1 specialism] + [full MR context]
agent 2 prompt: [base] + [agent 2 specialism] + [full MR context]
...
agent N prompt: [base] + [base] + [base]  + [full MR context]

If each sub-agent runs independently, the same MR context is billed N times. Cloudflare's framing: "Duplicating even a moderately-sized MR context across seven concurrent reviewers would multiply our token costs by 7x."

The fan-out shape

  1. Orchestrator assembles the shared context once — in Cloudflare's case, a file literally named shared-mr-context.txt.
  2. Orchestrator also writes per-file patch files to a diff_directory so each sub-agent can read only the patches relevant to its specialism.
  3. Each sub-agent prompt references the path ("read shared-mr-context.txt and the patches for files X, Y, Z from diff_directory/") rather than embedding content.
  4. Sub-agents use their built-in file-read tool on the shared paths.
  5. Provider-side prompt caching then kicks in: the base agent prompts are identical across all runs + the shared-mr-context.txt path is a cacheable input, so cross-request cache hits are dramatic.

Observed effect

Cloudflare's production numbers validate the design: 85.7% prompt-cache hit rate across ~120 B tokens processed in the first 30 days. Cache reads dominate cache writes by roughly 10:1. The savings are quantified: "this saves us an estimated five figures compared to what we would pay at full input token pricing."

Why the file-on-disk approach beats alternatives

Alternative Drawback
Embed full context in every sub-prompt 7× token cost on the context portion
Pass context via stdin to each sub-agent Loses prompt-caching; every sub-agent's prompt varies
Pin context in a KV / vector store Adds a network round-trip + cache-miss class; more infra
Include context as a system message Same token multiplication as embedding

File-on-disk is: cheap (one write), provider-cacheable (same path → same content → cache hit), tool-uniform (sub-agents already have file-read), and shared-memory-like (no duplication across processes).

  • Per-file patches in diff_directory — each sub-reviewer reads only relevant files, not the whole diff. Fan-out by specialism at the file level.
  • Same base prompts across all runs — sub-agent system prompts are stable; combined with the shared context file, they maximise what the provider's cache can hit.
  • Session affinity — routes repeated requests to the same backend so the provider's local cache is reused (sibling optimisation; additive with shared-context fan-out).

Generalisation

Any orchestrator that fans out to N sub-agents over a shared input benefits:

  • Multi-agent research pipelines that share a corpus.
  • RAG systems where N downstream specialists use the same retrieved chunks.
  • Judge / verifier pipelines that score the same trajectory from multiple angles.
  • Fan-out content-moderation that runs N classifiers on the same post.

Rule: if multiple agents read the same context, put it on disk once and point everyone at the path.

Seen in

Last updated · 200 distilled / 1,178 read