Skip to content

SYSTEM Cited by 1 source

FBDetect

Definition

FBDetect is Meta's in-house performance-regression-detection tool that can catch regressions as small as 0.005 % in noisy production environments and surfaces "thousands of regressions weekly" across Meta's fleet (Source: sources/2026-04-16-meta-capacity-efficiency-at-meta-how-unified-ai-agents-optimize-performance-at-hyperscale). It is the defense arm of Meta's Capacity Efficiency program — the system that decides "a regression happened" and attributes it to a root-cause pull request, handing off to downstream resolution (traditionally a human; since 2026 also the AI Regression Solver).

Architectural detail lives primarily in the SOSP 2024 paper: tangchq74.github.io/FBDetect-SOSP24.pdf.

What's disclosed in the 2026-04-16 Engineering post

  • Regression sensitivity: "as small as 0.005 % in noisy production environments." At Meta's 3 B+ user scale, 0.1 % regressions already translate to "significant additional power consumption," so micro-regression sensitivity is load-bearing.
  • Throughput: "thousands of regressions weekly."
  • Input domain: time-series data from production.
  • Root-cause attribution: "primarily traditional techniques such as correlating regression functions with recent pull requests." Hand-rolled correlation, not ML — the ML component sits downstream in the AI Regression Solver, not in the detector itself.
  • Output contract: a regression event + a candidate root-cause PR → engineer notification (traditionally) or the AI Regression Solver (since 2026).

Position in Meta's operational-AI stack

FBDetect is the detector; the AI Regression Solver is the responder. Meta explicitly names this separation: "After a root cause is determined, engineers are typically notified and expected to take action, such as optimizing the recent code change. We've added an additional feature to make this faster: AI Regression Solver."

Before 2026: - FBDetect → detect regression + attribute root-cause PR → notify engineer → engineer investigates + writes mitigation (hours) or PR is rolled back (velocity cost) or ignored (capacity cost).

After 2026: - FBDetect → detect + attribute → AI Regression Solver → fix-forward PR sent to original root-cause author for review → engineer reviews in minutes.

Why it matters at program scale

Meta ties FBDetect's throughput to the megawatt metric directly: "Meta's in-house regression detection tool, catches thousands of regressions weekly; faster automated resolution means fewer megawatts wasted compounding across the fleet." The compounding framing is the load-bearing economic argument — each undetected / slow-resolved regression is paid in ongoing fleet capacity, not a one-time cost.

What's NOT disclosed in the post

Most FBDetect internals live in the SOSP 2024 paper, not this post:

  • Time-series decomposition + statistical model underlying the 0.005 % sensitivity claim.
  • Change-point detection algorithms used.
  • False-positive rate on the regression stream.
  • Storage / indexing for the time-series substrate.
  • Signal ingestion path (how function-level metrics reach FBDetect — presumably Strobelight-class profiling plus service-level counters).
  • Root-cause-PR correlation algorithm.

Seen in

Last updated · 319 distilled / 1,201 read