Skip to content

PATTERN Cited by 1 source

Precomputed agent context files

Intent

Extract the knowledge AI coding agents need — module purpose, modification patterns, non-obvious failure modes, cross-module dependencies, tribal knowledge — into concise per-module markdown files produced by a one-shot offline orchestration pass, then consumed at request time by downstream agents. Replace ad-hoc exploration (15-25 tool calls per task burning on re-derivation) with a cheap graph lookup.

When to reach for it

  • You have a large, proprietary codebase with heavy tribal knowledge — a config-as-code pipeline or comparable cross-subsystem-coupled system.
  • Generic AI coding assistants fail silently on your code — compile-passing-but-wrong output, not obvious crashes.
  • The codebase is not in pretraining corpora — the knowledge is yours, not the model's.
  • You've measured (or can estimate) the per-task tool-call cost of exploration-based agent work and the number is painful.

Mechanism

Four stages:

1. Extract

One-session orchestration of specialised agents (patterns/specialized-agent-decomposition applied offline):

  • Explorers map the codebase.
  • Module analysts answer the five questions per module.
  • Writers synthesise answers into context files.

Scope: one file per module in compass-not-encyclopedia format.

2. Quality-gate

Multi-round critic discipline (patterns/multi-round-critic-quality-gate):

  • Critics score content + flag weaknesses over 3 rounds.
  • Fixers apply corrections.
  • Prompt testers validate behaviour across personas.
  • Final critics run integration tests.

Meta's result: 3.65 → 4.20 / 5.0, zero hallucinated file paths.

3. Surface

  • 59 context files (< 0.1% of model context window in aggregate)
  • Cross-repo dependency index ("what depends on X?" in ~200 tokens, 30× compression over multi-file exploration)
  • Data-flow maps
  • Natural-language orchestration layer routing engineer queries to the right tool + context

4. Maintain

Automated self-refresh loop: validate file paths, detect coverage gaps, re-run critics, auto-fix stale references — addressing context-file freshness as a first-order concern.

Why it works

  • Compass shape bounds the freshness cost — 25-35-line files are cheap to re-validate automatically.
  • Opt-in loading — agents load the 1-2 files relevant to the current task, not all 59. Avoids the academic-research failure mode where always-on context hurts.
  • Model-agnostic surface — markdown files consumable by any LLM; investment compounds across model upgrades rather than depreciates.
  • Extraction pays once, consumption pays many — Meta's preliminary 6 tasks show ~40% fewer tool calls and tokens per task — at fleet scale this dwarfs the extraction cost.
  • Zero hallucinated paths — the invariant that makes the context trustworthy enough to act on. Meta enforces this via final-round critic agents that validate every path reference.

Tradeoffs

  • Extraction cost is real — 50+ agents, one large-context-window- model session, a multi-round critic gate. Not free.
  • Pretraining-overlap sensitivity — on codebases the model already knows (Django, matplotlib, well-known OSS), this pattern hurts rather than helps (2025 academic research). Reserve for proprietary, tribal-knowledge-heavy code.
  • Freshness discipline is mandatory — stale context is worse than no context. Automation is the only viable path above toy scale.
  • Schema maintenance — the 4-section compass template is fit to code-navigation. Different domains (runbooks, migration guides) benefit from different section shapes.
  • Not a cure for underspecified APIs — if the underlying code is genuinely ambiguous, a context file can document the ambiguity but can't resolve it.

Distinct from sibling patterns

Pattern Produces Consumed by Stored in
Precomputed agent context files (this) Compass-shaped markdown AI coding agents Repo-local markdown
Centralized AOT indexing (Glean) Structured facts (Angle predicates) IDEs + static analysis + review tools + agents Distributed replicated DB
Diff-based static analysis Per-diff semantic summaries Reviewers + lint pipelines Diff-attached artifacts
Specialised agent decomposition (runtime) Runtime decisions End users (debug / review)

The four together cover Meta's full-spectrum approach to "making large proprietary code navigable by machines" — Glean for structured queryable facts, diff-sketches for per-change semantic artifacts, the precompute engine for prose-shaped tribal knowledge, and specialised-agent decomposition for runtime composition.

Seen in

  • Meta AI Pre-Compute Engine (2026-04-06) — canonical wiki instance. 50+-agent swarm produces 59 context files covering 100% of a 4,100-file config-as-code data pipeline. Preliminary 40% reduction in tool calls + tokens per task on 6 tasks; 0 hallucinated file paths; critic scores 3.65 → 4.20 / 5.0 across 3 rounds. "The AI isn't a consumer of this infrastructure, it's the engine that runs it." (Source: sources/2026-04-06-meta-how-meta-used-ai-to-map-tribal-knowledge-in-large-scale-data-pipelines.)
Last updated · 319 distilled / 1,201 read