SYSTEM Cited by 1 source
Meta Capacity Efficiency Platform¶
Definition¶
The Meta Capacity Efficiency Platform is Meta's unified AI-agent platform for hyperscale performance engineering — one substrate that serves both offense (proactively finding and shipping code-change optimizations) and defense (catching and resolving performance regressions). It is the production infrastructure underneath Meta's Capacity Efficiency program, which has recovered "hundreds of megawatts of power" (Source: sources/2026-04-16-meta-capacity-efficiency-at-meta-how-unified-ai-agents-optimize-performance-at-hyperscale).
Two-layer architecture¶
-
MCP Tools layer — standardized Model Context Protocol interfaces that let LLMs invoke code. "Each tool does one thing: query profiling data, fetch experiment results, retrieve configuration history, search code, or extract documentation." Five named categories in the post:
- profiling-data query
- experiment-results fetch
- configuration-history retrieval
- code search
- documentation extraction
-
Skills layer — modules that encode domain expertise about performance efficiency. A skill tells an LLM which tools to use and how to interpret results. "It captures reasoning patterns that experienced engineers developed over years, such as 'consult the top GraphQL endpoints for endpoint latency regressions' or 'look for recent schema changes if the affected function handles serialization'."
Together, tools + skills "promote a generalized language model into something that can apply the domain expertise typically held by senior engineers." Canonical wiki instance of patterns/mcp-tools-plus-skills-unified-platform.
The insight that made it one platform¶
"The breakthrough was realizing that both problems share the same structure... We didn't need two separate AI systems. We needed one platform that could serve both."
- Same tools across offense and defense: profiling data, code search, documentation, configuration history.
- Different skills per use case: regression-mitigation skills for defense (e.g. "regressions from logging can be mitigated by increasing sampling"); optimization-pattern skills for offense (e.g. "memoizing a given function to reduce CPU usage").
Agent compositions built on the platform¶
Defense: AI Regression Solver¶
Component of FBDetect (Meta's regression-detection tool). Three-phase pipeline:
- Gather context with tools — find regressed functions, look up the root-cause PR, pull exact files/lines changed.
- Apply domain expertise with skills — select the right mitigation skill for the codebase / language / regression type.
- Create resolution — produce a new PR, send to original root-cause author for review.
Canonical patterns/ai-generated-fix-forward-pr instance.
Offense: Opportunity Resolver¶
Mirrors the defensive pipeline:
- Gather context with tools — opportunity metadata + pattern documentation + prior-resolution examples + specific files/functions
- validation criteria.
- Apply domain expertise with skills — expert-encoded knowledge per opportunity type (e.g. memoization).
- Create resolution — candidate fix with guardrails (syntax / style / right-issue verification) → surfaced in engineer's editor, apply with one click.
Canonical patterns/opportunity-to-pr-ai-pipeline instance.
Additional skills composed over the same tool layer¶
"Within a year, the same foundation powered additional applications: conversational assistants for efficiency questions, capacity-planning agents, personalized opportunity recommendations, guided investigation workflows, and AI-assisted validation. Each new capability requires few to no new data integrations since they can just compose existing tools with new skills."
Why this is the right abstraction¶
The platform is the canonical wiki example of tool-skill decomposition as an operational-AI leverage mechanism:
- Tools amortize data-integration cost. Adding profiling / experiment / config-history access is expensive; done once, it serves everyone.
- Skills amortize domain-expertise cost. A skill encodes a senior engineer's playbook once and applies it uniformly everywhere the platform runs.
- New use cases compose freely. Capacity-planning ≈ existing tools + new skill. Conversational assistant ≈ existing tools + new skill. No new pipeline, no new data backfill.
Sibling framing to Meta's AI Pre-Compute Engine (2026-04-06): both bet on markdown-level encoded knowledge as the model-agnostic substrate. Pre-Compute Engine's version is offline compass-shape context files; Capacity Efficiency Platform's version is online invocable skills over a shared tool layer.
Operational outcomes¶
- Hundreds of megawatts recovered program-wide; "enough to power hundreds of thousands of American homes for a year."
- ~10 hours → ~30 minutes compression on manual-investigation time ("automating diagnoses can compress ~10 hours of manual investigation into ~30 minutes" — ~20× compression).
- Thousands of regressions weekly caught by FBDetect; faster automated resolution prevents compounding fleet waste.
- "AI-assisted opportunity resolution is expanding to more product areas every half, handling a growing volume of wins that engineers would never get to manually."
Position in Meta's operational-AI lineage¶
Meta now has three complementary operational-AI systems on the wiki:
| System | Date | Problem domain | Substrate |
|---|---|---|---|
| Meta RCA System | 2024-08-23 | Web-monorepo incident triage | Fine-tuned Llama-2 ranker + heuristic retriever |
| AI Pre-Compute Engine | 2026-04-06 | Config-as-code data pipeline navigation | Offline multi-agent swarm → 59 compass-shape context files |
| Meta Capacity Efficiency Platform | 2026-04-16 | Performance offense + defense | MCP tools + skills, per-use-case agents |
The 2026-04-16 platform is the MCP-standardised + runtime-composed variant of the operational-AI primitive: tools + skills rather than offline files + rankers + retrievers.
Caveats¶
- No total skill-catalogue size disclosed. Two example skills named (logging-sampling on defense; memoization on offense).
- Model / vendor identity opaque. "In-house coding agent" is named but not specified (no LLM identity, parameter scale, or inference cost).
- Guardrail mechanism thin. Offense's "verify syntax and style, confirm it addresses the right issue" is named but not decomposed into the verification layer (unit-test execution? static analysis? ML judge?).
- Platform size not disclosed. No tool count, skill count, agent count, invocations-per-day, or platform compute footprint.
- Attribution across offense vs defense unspecified in the megawatt figure.
- Integration surface for opportunities not specified (IDE plugin extension name, commit attribution flow).
Seen in¶
- sources/2026-04-16-meta-capacity-efficiency-at-meta-how-unified-ai-agents-optimize-performance-at-hyperscale — canonical introduction (Meta Engineering, Developer Tools; 2026-04-16).
Related¶
- companies/meta
- systems/fbdetect — defense detector; the regression-finder the AI Regression Solver acts on
- systems/meta-ai-regression-solver — the defensive agent on top of FBDetect
- systems/model-context-protocol — the tool-description standard the platform's tool layer speaks
- systems/meta-rca-system — predecessor in Meta's operational-AI lineage
- systems/meta-ai-precompute-engine — sibling context-engineering system
- systems/strobelight — Meta's profiling orchestrator; the platform's profiling-query tools are backed by Strobelight-class infrastructure
- concepts/capacity-efficiency
- concepts/offense-defense-performance-engineering
- concepts/encoded-domain-expertise
- concepts/context-engineering
- patterns/mcp-tools-plus-skills-unified-platform — the canonical architectural pattern
- patterns/ai-generated-fix-forward-pr
- patterns/opportunity-to-pr-ai-pipeline
- patterns/specialized-agent-decomposition — skill-over-shared-tools framing