PATTERN Cited by 1 source
Precomputed agent context files¶
Intent¶
Extract the knowledge AI coding agents need — module purpose, modification patterns, non-obvious failure modes, cross-module dependencies, tribal knowledge — into concise per-module markdown files produced by a one-shot offline orchestration pass, then consumed at request time by downstream agents. Replace ad-hoc exploration (15-25 tool calls per task burning on re-derivation) with a cheap graph lookup.
When to reach for it¶
- You have a large, proprietary codebase with heavy tribal knowledge — a config-as-code pipeline or comparable cross-subsystem-coupled system.
- Generic AI coding assistants fail silently on your code — compile-passing-but-wrong output, not obvious crashes.
- The codebase is not in pretraining corpora — the knowledge is yours, not the model's.
- You've measured (or can estimate) the per-task tool-call cost of exploration-based agent work and the number is painful.
Mechanism¶
Four stages:
1. Extract¶
One-session orchestration of specialised agents (patterns/specialized-agent-decomposition applied offline):
- Explorers map the codebase.
- Module analysts answer the five questions per module.
- Writers synthesise answers into context files.
Scope: one file per module in compass-not-encyclopedia format.
2. Quality-gate¶
Multi-round critic discipline (patterns/multi-round-critic-quality-gate):
- Critics score content + flag weaknesses over 3 rounds.
- Fixers apply corrections.
- Prompt testers validate behaviour across personas.
- Final critics run integration tests.
Meta's result: 3.65 → 4.20 / 5.0, zero hallucinated file paths.
3. Surface¶
- 59 context files (< 0.1% of model context window in aggregate)
- Cross-repo dependency index ("what depends on X?" in ~200 tokens, 30× compression over multi-file exploration)
- Data-flow maps
- Natural-language orchestration layer routing engineer queries to the right tool + context
4. Maintain¶
Automated self-refresh loop: validate file paths, detect coverage gaps, re-run critics, auto-fix stale references — addressing context-file freshness as a first-order concern.
Why it works¶
- Compass shape bounds the freshness cost — 25-35-line files are cheap to re-validate automatically.
- Opt-in loading — agents load the 1-2 files relevant to the current task, not all 59. Avoids the academic-research failure mode where always-on context hurts.
- Model-agnostic surface — markdown files consumable by any LLM; investment compounds across model upgrades rather than depreciates.
- Extraction pays once, consumption pays many — Meta's preliminary 6 tasks show ~40% fewer tool calls and tokens per task — at fleet scale this dwarfs the extraction cost.
- Zero hallucinated paths — the invariant that makes the context trustworthy enough to act on. Meta enforces this via final-round critic agents that validate every path reference.
Tradeoffs¶
- Extraction cost is real — 50+ agents, one large-context-window- model session, a multi-round critic gate. Not free.
- Pretraining-overlap sensitivity — on codebases the model already knows (Django, matplotlib, well-known OSS), this pattern hurts rather than helps (2025 academic research). Reserve for proprietary, tribal-knowledge-heavy code.
- Freshness discipline is mandatory — stale context is worse than no context. Automation is the only viable path above toy scale.
- Schema maintenance — the 4-section compass template is fit to code-navigation. Different domains (runbooks, migration guides) benefit from different section shapes.
- Not a cure for underspecified APIs — if the underlying code is genuinely ambiguous, a context file can document the ambiguity but can't resolve it.
Distinct from sibling patterns¶
| Pattern | Produces | Consumed by | Stored in |
|---|---|---|---|
| Precomputed agent context files (this) | Compass-shaped markdown | AI coding agents | Repo-local markdown |
| Centralized AOT indexing (Glean) | Structured facts (Angle predicates) | IDEs + static analysis + review tools + agents | Distributed replicated DB |
| Diff-based static analysis | Per-diff semantic summaries | Reviewers + lint pipelines | Diff-attached artifacts |
| Specialised agent decomposition (runtime) | Runtime decisions | End users (debug / review) | — |
The four together cover Meta's full-spectrum approach to "making large proprietary code navigable by machines" — Glean for structured queryable facts, diff-sketches for per-change semantic artifacts, the precompute engine for prose-shaped tribal knowledge, and specialised-agent decomposition for runtime composition.
Seen in¶
- Meta AI Pre-Compute Engine (2026-04-06) — canonical wiki instance. 50+-agent swarm produces 59 context files covering 100% of a 4,100-file config-as-code data pipeline. Preliminary 40% reduction in tool calls + tokens per task on 6 tasks; 0 hallucinated file paths; critic scores 3.65 → 4.20 / 5.0 across 3 rounds. "The AI isn't a consumer of this infrastructure, it's the engine that runs it." (Source: sources/2026-04-06-meta-how-meta-used-ai-to-map-tribal-knowledge-in-large-scale-data-pipelines.)
Related¶
- concepts/tribal-knowledge — what the pattern extracts
- concepts/compass-not-encyclopedia — the format rule for the output
- concepts/config-as-code-pipeline — the workload class this pattern fits
- concepts/context-file-freshness — the discipline that makes the pattern sustainable
- concepts/context-engineering — parent discipline
- patterns/multi-round-critic-quality-gate — the quality-gate stage
- patterns/five-questions-knowledge-extraction — the extraction methodology
- patterns/self-maintaining-context-layer — the refresh-loop mechanism
- patterns/specialized-agent-decomposition — offline-context-generation variant
- systems/meta-ai-precompute-engine — the canonical producing system