CONCEPT Cited by 1 source

Capacity efficiency¶

Definition¶

Capacity efficiency is the engineering discipline of reducing compute, memory, power, and capacity demand per unit of product value at hyperscale. At Meta's scale — "more than 3 billion people" — "even a 0.1% performance regression can translate to significant additional power consumption," so capacity efficiency is a first-class engineering function, not a periodic optimization pass (Source: sources/2026-04-16-meta-capacity-efficiency-at-meta-how-unified-ai-agents-optimize-performance-at-hyperscale).

The two sides¶

Meta's Capacity Efficiency program is explicitly two-sided:

Offense: "searching for opportunities (proactive code changes) to make our existing systems more efficient, and deploying them."
Defense: "monitoring resource usage in production to detect regressions, root-cause them to a pull request, and deploy mitigations."

Both sides pay in the same unit — megawatts of fleet power — but neither side is the whole picture: defense without offense protects what you have, offense without defense leaks gains back through regressions.

Why it's a program, not a project¶

The named constraint is human engineering time. Engineers have to:

Query profiling data to find optimization candidates.
Review opportunity descriptions, documentation, and past examples.
Check recent code / configuration deployments for step-changes in resource usage.
Check recent internal discussions for launch-correlated regressions.

"Many engineers at Meta use our efficiency tools to work on these problems every day. But no matter how high-quality the tooling is, engineers have limited time to address performance issues when innovating on new products is our top priority." Capacity efficiency as a program competes for the same engineer-hours as product work, so scaling megawatt delivery without proportionally scaling headcount becomes the operating constraint.

How Meta measures it¶

Absolute: "hundreds of megawatts of power" recovered — "enough to power hundreds of thousands of American homes for a year." Program-level metric.
Regression-rate: "thousands of regressions weekly" caught by FBDetect; "fewer megawatts wasted compounding across the fleet" is the economic model.
Investigation-time compression: "~10 hours of manual investigation into ~30 minutes" — ~20× — as the direct bottleneck-lifting metric.
Service-level (from the 2025-03-07 Strobelight post): "up to 20 % reduction in CPU cycles" for top-200-services FDO rollouts, "15,000 servers/year" for a single hot-path fix.

Relationship to other wiki concepts¶

Distinct from: cost attribution / chargeback (chargeback) — capacity efficiency is about reducing the absolute cost, chargeback is about assigning the current cost to its driver.
Complementary to: rack-level power density (concepts/rack-level-power-density) — one is software-layer efficiency, the other is infrastructure-layer density. Both bear on total-fleet-power.
Enabled by: the profiling + regression-detection + code-index stack — Strobelight, systems/fbdetect, Glean. Without these, neither offense nor defense has input.

Why AI matters here specifically¶

Capacity efficiency is the textbook long-tail problem: - Each individual optimization is small (0.1 %, one hot function, one service). - The fleet has millions of such opportunities. - The cost of a human investigating any one of them exceeds the per-opportunity payoff.

AI that compresses investigation-time by 20× (or equivalently multiplies per-engineer throughput by 20×) converts previously uneconomical optimizations into shipped fixes. Meta: "AI-assisted opportunity resolution is expanding to more product areas every half, handling a growing volume of wins that engineers would never get to manually." The self-sustaining engine is the target end-state.

Seen in¶

sources/2026-04-16-meta-capacity-efficiency-at-meta-how-unified-ai-agents-optimize-performance-at-hyperscale — canonical wiki disclosure of the program-level frame.
sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology — load-bearing tool for the offense side of the program.
sources/2024-08-23-meta-leveraging-ai-for-efficient-incident-response — the operational-AI predecessor with the same closed-feedback-loop
confidence-thresholding discipline.

concepts/offense-defense-performance-engineering — the two-sided frame
concepts/encoded-domain-expertise — the skill primitive AI efficiency agents consume
concepts/rack-level-power-density — infrastructure-layer sibling
concepts/hyperscale-compute-workload — the workload class capacity efficiency applies to
systems/meta-capacity-efficiency-platform — Meta's program infrastructure
systems/fbdetect — the defensive detector
patterns/feedback-directed-optimization-fleet-pipeline — a specific offensive pipeline (FDO via Strobelight + BOLT)