CONCEPT Cited by 1 source

False-positive / false-negative asymmetry¶

Definition¶

False-positive / false-negative asymmetry is the design property of a classification or membership system where the cost of a false positive and the cost of a false negative are structurally different — and choosing a substrate that can only be wrong in the cheaper direction buys correctness on the expensive direction.

Concretely, for a binary oracle answering "is x in the set?":

False positive (oracle says yes, truth says no): the caller proceeds to a slower authoritative check that correctly returns "no". Cost = one extra check per false positive. Usually bounded, predictable, and small.
False negative (oracle says no, truth says yes): the caller treats x as genuinely absent. Cost = whatever the system does for "absent", applied to a genuinely present element. Often unbounded, user-visible, or structurally wrong.

When the two costs are dramatically asymmetric, the right substrate is one whose error mode matches the cheap direction. A Bloom filter has false positives but no false negatives — correct choice when false negatives are expensive and false positives are cheap.

Canonical Vercel instance¶

Vercel's 2026-04-21 blog post surfaces the asymmetry explicitly in the build-output path lookup:

"Bloom filters can return false positives, but never false negatives. For path lookups, this property is valuable. If the Bloom filter says a path does not exist, we can safely return a 404; if it says a path might exist, we fall back to checking the build outputs."

And the cost model:

"We can't afford false negatives (returning 404 for valid pages), and Bloom filters guarantee this won't happen. False positives just trigger an extra storage request to find the file doesn't exist."

(Source: sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters.)

The cost ratio:

False positive → one extra (successful) storage fetch that confirms the 404. Routine operation; milliseconds of added latency for one request.
False negative → an indexed, linked, user-requested real page returns 404. SEO damage, broken links, user trust erosion, availability incident.

The asymmetry is orders of magnitude — which is why a Bloom filter is unambiguously the right substrate and the exact JSON tree's correctness guarantee is worth trading for its parse-time latency win.

The design move¶

Once the asymmetry is named:

Classify the failure modes of the current authoritative structure: where are its false positives vs false negatives? If it's exact, what's the cost model of its queries?
Find the cost-dominant path — is the common case (negative lookups, the "no" answer) fast enough? Or is the slow "no" blocking a hot path?
Substitute a probabilistic fast-negative whose error mode lies in the cheap direction. Compose with a fallback authoritative check on "maybe".
Size the filter for its false-positive cost budget — p chosen so that FP-caused fallback lookups are a small fraction of total request traffic.

Examples in the corpus¶

System	Cheap direction	Expensive direction	Substrate
Vercel routing	false positive (extra storage fetch → 404)	false negative (wrongly 404 a valid page)	concepts/bloom-filter
Chrome malicious-URL filter	false positive (extra server check)	false negative (visit a known-bad URL)	concepts/bloom-filter
Column-store pruning	false positive (scan one more block)	false negative (miss matching rows)	concepts/bloom-filter + zone maps
ML feature store (Zalando)	false positive (flip 0 → 1, absorbed as input noise)	false negative (n/a — Bloom filters have none)	concepts/sketching-feature-store
Fraud detection	false positive (review a legit tx)	false negative (approve a fraud tx)	ML classifier, tuned precision/recall
Spam detection	false positive (quarantine a real email)	false negative (deliver spam)	depends on cost model
Medical screening	false positive (extra test, stress)	false negative (missed diagnosis)	depends; often symmetric
Content moderation	false positive (block legit content)	false negative (miss harmful content)	depends; often symmetric

The first three have strongly asymmetric costs and admit simple probabilistic solutions. The bottom three have contested cost models — false positives aren't cheap — and so can't be reduced to a single-shape decision.

Distinct from false-positive management¶

concepts/false-positive-management names the operational discipline of keeping false positives tolerable when they're the error mode you accepted: allowlisting, triage workflows, measuring FP rate. This concept names the design-time choice of which error mode to accept in the first place.

The two compose: you pick the substrate whose error mode matches your cost asymmetry; then you run false-positive management on the error mode you accepted.

Anti-patterns¶

Treating the two errors as symmetric when they aren't. Leads to over-engineered exact structures (the Vercel JSON tree) or over-eager probabilistic filters (Chrome's early phishing-URL filter was too coarse, caused legitimate-site FPs, required allowlist expansion).
Choosing a probabilistic structure whose error mode is the wrong direction. A counting Bloom filter permits both FP and (under deletion) false negatives from decrement bugs; inappropriate for 404 filters.
Not sizing p to the fallback cost budget. A 1 % false-positive rate might be fine for a disk-cache filter (40 extra disk seeks per 4000 queries) but disastrous for a fraud-detection filter (400 false-fraud review flags per 40,000 transactions).
Conflating error mode with error magnitude. A structure with higher precision but symmetric error isn't strictly better than one with lower precision but asymmetric error aligned with cost.

Seen in¶

sources/2026-04-21-vercel-how-we-made-global-routing-faster-with-bloom-filters — Canonical wiki introduction. Vercel's routing-service Bloom-filter substitution makes the asymmetry verbatim: false negative = "return 404 for valid pages" (unbounded damage), false positive = "extra storage request to find the file doesn't exist" (bounded, ~one storage roundtrip).
— Asymmetry in ML feature stores. Zalando canonicalises the noise-as-cost framing: a Bloom-filter feature store's false positives flip binary historical features 0 → 1, which the downstream classifier absorbs as "noise in the input" rather than misclassification. False negatives are structurally impossible. The cost of a false positive is therefore a tunable degradation in model AUC (the benchmark shows AUC ≈ 0.7997 at 470 MB vs 0.80 uncompressed — a ~0.03-point AUC cost for a 30× space saving) rather than a per-request latency or correctness cost. Generalises the asymmetry framework beyond request-routing: noise-tolerant consumers (ML models, aggregators, trend analytics) admit probabilistic substrates with false-positive-only error modes, with the noise rate directly tunable against storage / compute budget.
concepts/bloom-filter — The canonical data structure whose error mode is false-positive-only; the Vercel case is the canonical design application.
concepts/false-positive-management — Operational companion: manage the FP rate of whichever error mode the substrate commits to.
patterns/two-stage-evaluation — Composes with this asymmetry: stage 1 is the probabilistic oracle whose cheap-direction error is tolerated; stage 2 is the authoritative fallback invoked only on "maybe".