Skip to content

CONCEPT Cited by 1 source

False-positive / false-negative asymmetry

Definition

False-positive / false-negative asymmetry is the design property of a classification or membership system where the cost of a false positive and the cost of a false negative are structurally different — and choosing a substrate that can only be wrong in the cheaper direction buys correctness on the expensive direction.

Concretely, for a binary oracle answering "is x in the set?":

  • False positive (oracle says yes, truth says no): the caller proceeds to a slower authoritative check that correctly returns "no". Cost = one extra check per false positive. Usually bounded, predictable, and small.
  • False negative (oracle says no, truth says yes): the caller treats x as genuinely absent. Cost = whatever the system does for "absent", applied to a genuinely present element. Often unbounded, user-visible, or structurally wrong.

When the two costs are dramatically asymmetric, the right substrate is one whose error mode matches the cheap direction. A Bloom filter has false positives but no false negatives — correct choice when false negatives are expensive and false positives are cheap.

Canonical Vercel instance

Vercel's 2026-04-21 blog post surfaces the asymmetry explicitly in the build-output path lookup:

"Bloom filters can return false positives, but never false negatives. For path lookups, this property is valuable. If the Bloom filter says a path does not exist, we can safely return a 404; if it says a path might exist, we fall back to checking the build outputs."

And the cost model:

"We can't afford false negatives (returning 404 for valid pages), and Bloom filters guarantee this won't happen. False positives just trigger an extra storage request to find the file doesn't exist."

(Source: sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters.)

The cost ratio:

  • False positive → one extra (successful) storage fetch that confirms the 404. Routine operation; milliseconds of added latency for one request.
  • False negative → an indexed, linked, user-requested real page returns 404. SEO damage, broken links, user trust erosion, availability incident.

The asymmetry is orders of magnitude — which is why a Bloom filter is unambiguously the right substrate and the exact JSON tree's correctness guarantee is worth trading for its parse-time latency win.

The design move

Once the asymmetry is named:

  1. Classify the failure modes of the current authoritative structure: where are its false positives vs false negatives? If it's exact, what's the cost model of its queries?
  2. Find the cost-dominant path — is the common case (negative lookups, the "no" answer) fast enough? Or is the slow "no" blocking a hot path?
  3. Substitute a probabilistic fast-negative whose error mode lies in the cheap direction. Compose with a fallback authoritative check on "maybe".
  4. Size the filter for its false-positive cost budgetp chosen so that FP-caused fallback lookups are a small fraction of total request traffic.

Examples in the corpus

System Cheap direction Expensive direction Substrate
Vercel routing false positive (extra storage fetch → 404) false negative (wrongly 404 a valid page) concepts/bloom-filter
Chrome malicious-URL filter false positive (extra server check) false negative (visit a known-bad URL) concepts/bloom-filter
Column-store pruning false positive (scan one more block) false negative (miss matching rows) concepts/bloom-filter + zone maps
Fraud detection false positive (review a legit tx) false negative (approve a fraud tx) ML classifier, tuned precision/recall
Spam detection false positive (quarantine a real email) false negative (deliver spam) depends on cost model
Medical screening false positive (extra test, stress) false negative (missed diagnosis) depends; often symmetric
Content moderation false positive (block legit content) false negative (miss harmful content) depends; often symmetric

The first three have strongly asymmetric costs and admit simple probabilistic solutions. The bottom three have contested cost models — false positives aren't cheap — and so can't be reduced to a single-shape decision.

Distinct from false-positive management

concepts/false-positive-management names the operational discipline of keeping false positives tolerable when they're the error mode you accepted: allowlisting, triage workflows, measuring FP rate. This concept names the design-time choice of which error mode to accept in the first place.

The two compose: you pick the substrate whose error mode matches your cost asymmetry; then you run false-positive management on the error mode you accepted.

Anti-patterns

  • Treating the two errors as symmetric when they aren't. Leads to over-engineered exact structures (the Vercel JSON tree) or over-eager probabilistic filters (Chrome's early phishing-URL filter was too coarse, caused legitimate-site FPs, required allowlist expansion).
  • Choosing a probabilistic structure whose error mode is the wrong direction. A counting Bloom filter permits both FP and (under deletion) false negatives from decrement bugs; inappropriate for 404 filters.
  • Not sizing p to the fallback cost budget. A 1 % false-positive rate might be fine for a disk-cache filter (40 extra disk seeks per 4000 queries) but disastrous for a fraud-detection filter (400 false-fraud review flags per 40,000 transactions).
  • Conflating error mode with error magnitude. A structure with higher precision but symmetric error isn't strictly better than one with lower precision but asymmetric error aligned with cost.

Seen in

  • sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters — Canonical wiki introduction. Vercel's routing-service Bloom-filter substitution makes the asymmetry verbatim: false negative = "return 404 for valid pages" (unbounded damage), false positive = "extra storage request to find the file doesn't exist" (bounded, ~one storage roundtrip).

  • concepts/bloom-filter — The canonical data structure whose error mode is false-positive-only; the Vercel case is the canonical design application.

  • concepts/false-positive-management — Operational companion: manage the FP rate of whichever error mode the substrate commits to.

  • patterns/two-stage-evaluation — Composes with this asymmetry: stage 1 is the probabilistic oracle whose cheap-direction error is tolerated; stage 2 is the authoritative fallback invoked only on "maybe".

Last updated · 476 distilled / 1,218 read