SYSTEM Cited by 1 source
Meta AI Pre-Compute Engine¶
Definition¶
The Meta AI Pre-Compute Engine is Meta's internal production infrastructure that produces and maintains a navigable tribal-knowledge layer over a large config-as-code data pipeline — four repositories, three languages (Python configs + C++ services + Hack automation), 4,100+ files — so downstream AI coding agents can work across the pipeline effectively.
The system has three parts:
- Pre-compute swarm — a one-session orchestration of 50+ specialized AI agents that reads every module and emits 59 compass-not-encyclopedia context files (25-35 lines / ~1,000 tokens each) covering 100% of code modules.
- Runtime surface — the 59 context files + a cross-repo dependency index + data-flow maps + a natural-language orchestration layer that routes engineer questions to the right tool (e.g. "Is the pipeline healthy?" → dashboard scanner + 85+ incident-pattern matcher; "Add a new data field" → multi-phase validation pipeline).
- Self-maintenance loop — automated jobs that run "every few weeks" to validate file paths, detect coverage gaps, re-run critic agents, and auto-fix stale references, enforcing concepts/context-file-freshness.
Pre-compute swarm architecture¶
A single session of a large-context-window model orchestrates 50+ specialized AI agents across nine roles:
| Role | Count | Responsibility |
|---|---|---|
| Explorer agents | 2 | Map the codebase (repos, modules, languages) |
| Module analysts | 11 | Read every file + answer the five questions per module |
| Writer agents | 2 | Generate the 59 context files |
| Critic agents | 10+ | Three rounds of independent quality review |
| Fixer agents | 4 | Apply corrections from critic findings |
| Upgrader agents | 8 | Refine the orchestration / routing layer |
| Prompt tester agents | 3 | Validate 55+ queries across 5 engineer personas |
| Gap-filler agents | 4 | Cover remaining directories missed by earlier passes |
| Final critic agents | 3 | Integration tests before release |
Canonical wiki instance of patterns/specialized-agent-decomposition applied to offline context generation rather than runtime investigation (Databricks Storex) or code review (Cloudflare AI Code Review).
Runtime surface¶
The layer downstream AI coding agents actually consume:
- 59 context files: 25-35 lines · ~1,000 tokens each · four mandated sections — Quick Commands (copy-paste ops), Key Files ("the 3-5 files you actually need"), Non-Obvious Patterns, See Also (cross-references). Together < 0.1% of a modern model's context window.
- Cross-repo dependency index + data-flow maps: turns "what depends on X?" from ~6,000 tokens of multi-file exploration to a ~200-token single graph lookup (30× compression).
- Natural-language orchestration layer: takes engineer prompts,
routes to the right tool. Two canonical routings disclosed:
- "Is the pipeline healthy?" → dashboard scanner + 85+ historical incident patterns (lineage: Meta RCA, 2024-08-23).
- "Add a new data field" → multi-phase validation generator using the 59 context files to respect cross-subsystem invariants.
- Opt-in loading: context files are "loaded only when relevant, not always-on" — one of Meta's explicit design responses to the academic-research pitfall that found always-on context files hurt agent success on Django / matplotlib.
Self-maintenance loop¶
Every few weeks, automated jobs:
- Validate file paths against the live repos (detect renames / moves / deletions).
- Detect coverage gaps (new modules added since last refresh).
- Re-run critic agents against updated content.
- Auto-fix stale references.
Meta's framing: "The AI isn't a consumer of this infrastructure, it's the engine that runs it." This is the canonical wiki instance of patterns/self-maintaining-context-layer and the operational answer to concepts/context-file-freshness.
The 50+ non-obvious patterns¶
Question 5 of the five-questions framework — "What tribal knowledge is buried in code comments?" — produced the deepest learnings. Categories Meta names explicitly:
- Hidden intermediate naming conventions — a pipeline stage outputs a temporary field name that a downstream stage renames; referencing the wrong one fails code generation silently.
- Append-only identifier rules — "deprecated" enum values must never be removed because serialization compatibility depends on the full historic enum space.
- Configuration-mode field-name mismatches — two configuration modes use different field names for the same operation; swapping them produces silent wrong output.
Key results¶
| Metric | Before | After |
|---|---|---|
| AI context coverage | ~5 % (5 files) | 100 % (59 files) |
| Codebase files with AI navigation | ~50 | 4,100+ |
| Tribal knowledge documented | 0 | 50+ non-obvious patterns |
| Tested prompts core pass rate | 0 | 55+ (100 %) |
| Critic quality score | 3.65 / 5.0 | 4.20 / 5.0 (after 3 rounds) |
| Hallucinated file paths | — | 0 |
| Tool calls + tokens / task | baseline | ~40 % fewer (6 tasks) |
| Complex workflow guidance cycle | ~2 days | ~30 min |
Why this is a system, not just a doc set¶
The 59 markdown files are the surface, but the system is the pipeline that produces and maintains them: a reproducible 50+- agent orchestration, a quality gate with measurable improvement across rounds, a zero-hallucination file-path invariant, and a self-refresh loop that closes the staleness gap. Without the pipeline, the files would be a one-shot doc export that decays — which Meta explicitly names as "worse than no context at all."
Caveats¶
- One pipeline scoped at time of publication (2026-04-06). Meta names expansion to additional pipelines in Future Work.
- Preliminary numbers on six tasks — no fleet-wide production deployment metrics (adoption, QPS, engineer-session usage) are disclosed.
- Large-context-window model is the substrate but vendor / model version not named.
- Cross-repo dependency index + data-flow maps generation mechanism, storage format, and refresh cadence are not described beyond the 30× compression headline.
- Orchestration layer's NL-to-tool router accuracy is not benchmarked against the tool- selection-accuracy axis Datadog catalogued.
Related¶
- companies/meta
- concepts/tribal-knowledge
- concepts/compass-not-encyclopedia
- concepts/config-as-code-pipeline
- concepts/context-file-freshness
- concepts/context-engineering — parent concept class; this is the offline-preloaded variant
- concepts/ai-agent-guardrails — multi-round critic review is a guardrail instance
- patterns/precomputed-agent-context-files — the canonical architectural pattern this system instantiates
- patterns/multi-round-critic-quality-gate — the 3-round critic + fixer discipline
- patterns/five-questions-knowledge-extraction — the per-module methodology
- patterns/self-maintaining-context-layer — the auto-refresh loop
- patterns/specialized-agent-decomposition — offline-context-generation variant
- systems/meta-rca-system — the operational-AI lineage the NL router reuses for "is the pipeline healthy?"
- sources/2026-04-06-meta-how-meta-used-ai-to-map-tribal-knowledge-in-large-scale-data-pipelines