CONCEPT Cited by 2 sources

False-positive management¶

Definition¶

False-positive management is the set of practices that keep a continuous-detection system's alert stream signal-dense enough for engineers to trust. Without it, a detector that surfaces genuine bugs and many legitimate-but-flagged events produces alert fatigue: the on-call rotation stops paging on the channel, real findings get missed, and the detection system effectively turns off — even while it keeps running.

Figma names this explicitly as a first-class operational concern (Source: sources/2026-04-21-figma-visibility-at-scale-sensitive-data-exposure):

"Manage false positives (or they'll manage you!) A high false positive rate can overwhelm teams and reduce trust in alerts. To address this, we implemented dynamic allowlisting and rigorous triage workflows."

Why it's structural, not tactical¶

Every useful detector starts over-eager. The detector's author lists every possible exposure; real production contains many intentional and safe exposures (e.g., an OAuth client secret returned by a dedicated credential-management endpoint). The gap between possible leak and actual bug is the false-positive surface the team must manage continuously as:

Product features ship that intentionally expose new fields.
Endpoints are added, removed, or re-scoped.
Policy changes about what's considered sensitive.
Engineers learn which findings actually reflect bugs.

Unmanaged, the FP rate grows monotonically; the alert channel drowns; engineers opt out of the channel; the detector becomes invisible infrastructure consuming CPU with no readers.

The three levers¶

Dynamic allowlisting — suppress specific (endpoint, field, context) tuples where exposure is intentional. Runtime-configurable, no redeploy. (patterns/dynamic-allowlist-for-safe-exposure)
Rigorous triage workflows — every finding has an owner, a disposition (true-positive / false-positive / allowlisted), and a deadline. New false-positive patterns turn into new allowlist entries.
Measurement of the FP rate itself — track precision (true positives / total findings) over time per endpoint / field / rule. A drop in precision is a signal to retune the detector or expand the allowlist.

The feedback-loop shape¶

detection → finding → triage
                         │
                         ├── true positive → file + fix
                         │
                         ├── false positive known-safe
                         │      → add allowlist entry → re-run
                         │
                         └── noise / bug in detector
                                → retune rule / drop rule

The triage step produces improvements to the detection configuration. Without triage, the detector doesn't learn.

Metrics that matter¶

Alert rate: findings per unit time (total, per-rule).
Precision: true positives / findings.
Time-to-close: median hours from finding → disposition.
Allowlist churn: new entries per week, stale entries per quarter, reviewed entries per quarter.
Channel engagement: is the on-call actually reading the findings, or have they muted the channel?

Dynamic allowlisting ≠ blanket suppression¶

The anti-pattern is suppressing whole rules or whole endpoints to quiet the channel. The allowlist must be narrow enough that a new bug in the same endpoint still trips the detector. From patterns/dynamic-allowlist-for-safe-exposure: scope to (endpoint × field × value-shape × optional role), not (endpoint × anything).

Adjacent failure modes¶

Silent suppression without audit. When an allowlist suppresses a finding, it should still show up in the triage dashboard at lower severity, so over-broad suppressions are visible.
Allowlist drift. Temporary suppressions added during incident response are never removed. Ownership + expiry dates on every entry solve this.
Re-detecting already-known bugs. If a finding is filed as a ticket but not suppressed, it keeps firing daily until the fix ships. The pipeline should support a "known, tracked, do not re-page" state.
FP rate as cost-of-doing-business. Treating FPs as inherent rather than a product quality problem of the detector itself keeps the detector below engineer-trust threshold forever.

Context sensitivity¶

Figma emphasizes that not all sensitive-data exposures are equally problematic: "By using a dynamic configuration, we could quickly tune detection rules without redeploying services." The detector's notion of severity must be tunable out-of-band so triage can prioritize the genuinely risky.

Seen in¶

sources/2026-04-21-figma-visibility-at-scale-sensitive-data-exposure — Figma Response Sampling's explicit first-class design treatment of FP management: dynamic allowlisting, rigorous triage, runtime-tunable detection rules.
sources/2026-01-15-github-when-protections-outlive-their-purpose — GitHub's 2026-01-15 post-mortem on rate-limit / abuse-protection false positives generated by stale emergency mitigations. The composite-signal design (fingerprint AND business-logic) kept the aggregate FP rate at 0.003–0.004 % of total traffic, but within the fingerprint-matched population the FP rate was 0.5–0.9 % and within the both-matched population it was 100 %. The drift happened silently — FP rate was not tracked per-rule, so detection landed via external user reports rather than internal telemetry. Canonicalises the temporal axis of FP management: every composite rule ages as threat patterns evolve and legitimate-traffic populations shift. Pairs with concepts/incident-mitigation-lifecycle and patterns/expiring-incident-mitigation at the lifecycle- discipline end, and patterns/cross-layer-block-tracing at the per-rule-telemetry end. Three-workstream remediation: visibility across protection layers; incident mitigations temporary by default; post-incident practices that re-evaluate emergency controls.