SYSTEM Cited by 1 source

Meta AI Pre-Compute Engine¶

Definition¶

The Meta AI Pre-Compute Engine is Meta's internal production infrastructure that produces and maintains a navigable tribal-knowledge layer over a large config-as-code data pipeline — four repositories, three languages (Python configs + C++ services + Hack automation), 4,100+ files — so downstream AI coding agents can work across the pipeline effectively.

The system has three parts:

Pre-compute swarm — a one-session orchestration of 50+ specialized AI agents that reads every module and emits 59 compass-not-encyclopedia context files (25-35 lines / ~1,000 tokens each) covering 100% of code modules.
Runtime surface — the 59 context files + a cross-repo dependency index + data-flow maps + a natural-language orchestration layer that routes engineer questions to the right tool (e.g. "Is the pipeline healthy?" → dashboard scanner + 85+ incident-pattern matcher; "Add a new data field" → multi-phase validation pipeline).
Self-maintenance loop — automated jobs that run "every few weeks" to validate file paths, detect coverage gaps, re-run critic agents, and auto-fix stale references, enforcing concepts/context-file-freshness.

Pre-compute swarm architecture¶

A single session of a large-context-window model orchestrates 50+ specialized AI agents across nine roles:

Role	Count	Responsibility
Explorer agents	2	Map the codebase (repos, modules, languages)
Module analysts	11	Read every file + answer the five questions per module
Writer agents	2	Generate the 59 context files
Critic agents	10+	Three rounds of independent quality review
Fixer agents	4	Apply corrections from critic findings
Upgrader agents	8	Refine the orchestration / routing layer
Prompt tester agents	3	Validate 55+ queries across 5 engineer personas
Gap-filler agents	4	Cover remaining directories missed by earlier passes
Final critic agents	3	Integration tests before release

Canonical wiki instance of patterns/specialized-agent-decomposition applied to offline context generation rather than runtime investigation (Databricks Storex) or code review (Cloudflare AI Code Review).

Runtime surface¶

The layer downstream AI coding agents actually consume:

59 context files: 25-35 lines · ~1,000 tokens each · four mandated sections — Quick Commands (copy-paste ops), Key Files ("the 3-5 files you actually need"), Non-Obvious Patterns, See Also (cross-references). Together < 0.1% of a modern model's context window.
Cross-repo dependency index + data-flow maps: turns "what depends on X?" from ~6,000 tokens of multi-file exploration to a ~200-token single graph lookup (30× compression).
Natural-language orchestration layer: takes engineer prompts, routes to the right tool. Two canonical routings disclosed:
- "Is the pipeline healthy?" → dashboard scanner + 85+ historical incident patterns (lineage: Meta RCA, 2024-08-23).
- "Add a new data field" → multi-phase validation generator using the 59 context files to respect cross-subsystem invariants.
Opt-in loading: context files are "loaded only when relevant, not always-on" — one of Meta's explicit design responses to the academic-research pitfall that found always-on context files hurt agent success on Django / matplotlib.

Self-maintenance loop¶

Every few weeks, automated jobs:

Validate file paths against the live repos (detect renames / moves / deletions).
Detect coverage gaps (new modules added since last refresh).
Re-run critic agents against updated content.
Auto-fix stale references.

Meta's framing: "The AI isn't a consumer of this infrastructure, it's the engine that runs it." This is the canonical wiki instance of patterns/self-maintaining-context-layer and the operational answer to concepts/context-file-freshness.

The 50+ non-obvious patterns¶

Question 5 of the five-questions framework — "What tribal knowledge is buried in code comments?" — produced the deepest learnings. Categories Meta names explicitly:

Hidden intermediate naming conventions — a pipeline stage outputs a temporary field name that a downstream stage renames; referencing the wrong one fails code generation silently.
Append-only identifier rules — "deprecated" enum values must never be removed because serialization compatibility depends on the full historic enum space.
Configuration-mode field-name mismatches — two configuration modes use different field names for the same operation; swapping them produces silent wrong output.

Key results¶

Metric	Before	After
AI context coverage	~5 % (5 files)	100 % (59 files)
Codebase files with AI navigation	~50	4,100+
Tribal knowledge documented	0	50+ non-obvious patterns
Tested prompts core pass rate	0	55+ (100 %)
Critic quality score	3.65 / 5.0	4.20 / 5.0 (after 3 rounds)
Hallucinated file paths	—	0
Tool calls + tokens / task	baseline	~40 % fewer (6 tasks)
Complex workflow guidance cycle	~2 days	~30 min

Why this is a system, not just a doc set¶

The 59 markdown files are the surface, but the system is the pipeline that produces and maintains them: a reproducible 50+- agent orchestration, a quality gate with measurable improvement across rounds, a zero-hallucination file-path invariant, and a self-refresh loop that closes the staleness gap. Without the pipeline, the files would be a one-shot doc export that decays — which Meta explicitly names as "worse than no context at all."

Caveats¶

One pipeline scoped at time of publication (2026-04-06). Meta names expansion to additional pipelines in Future Work.
Preliminary numbers on six tasks — no fleet-wide production deployment metrics (adoption, QPS, engineer-session usage) are disclosed.
Large-context-window model is the substrate but vendor / model version not named.
Cross-repo dependency index + data-flow maps generation mechanism, storage format, and refresh cadence are not described beyond the 30× compression headline.
Orchestration layer's NL-to-tool router accuracy is not benchmarked against the tool- selection-accuracy axis Datadog catalogued.

companies/meta
concepts/tribal-knowledge
concepts/compass-not-encyclopedia
concepts/config-as-code-pipeline
concepts/context-file-freshness
concepts/context-engineering — parent concept class; this is the offline-preloaded variant
concepts/ai-agent-guardrails — multi-round critic review is a guardrail instance
patterns/precomputed-agent-context-files — the canonical architectural pattern this system instantiates
patterns/multi-round-critic-quality-gate — the 3-round critic + fixer discipline
patterns/five-questions-knowledge-extraction — the per-module methodology
patterns/self-maintaining-context-layer — the auto-refresh loop
patterns/specialized-agent-decomposition — offline-context-generation variant
systems/meta-rca-system — the operational-AI lineage the NL router reuses for "is the pipeline healthy?"
sources/2026-04-06-meta-how-meta-used-ai-to-map-tribal-knowledge-in-large-scale-data-pipelines