Skip to content

CONCEPT Cited by 1 source

Composite fingerprint signal

Definition

A composite fingerprint signal is a detection construct that blocks or throttles a request only when two independent classes of evidence agree — typically an industry-standard fingerprint (TLS / header-shape / IP-pool / client-class) and a platform-specific business-logic predicate (auth state, request path, action sequence, tenant context). Neither input alone is sufficient; the conjunction is.

The design goal is to keep the fleet-wide false-positive rate small by requiring orthogonal evidence, while still catching abusive traffic cleanly inside the intersection.

Why composite, not single-signal

A single fingerprint almost always over-matches at scale: any industry-standard fingerprint catches some population of legitimate clients that happen to share implementation details with abusive ones. Blocking on fingerprint alone yields unacceptable false-positive rates for any platform with a broad, heterogeneous legitimate-client population.

A single business-logic rule almost always under-matches: abuse traffic that learns the rule can shape around it; legitimate traffic that stumbles into the pattern is blocked without recourse.

The conjunction produces a distinct empirical property:

"Among requests that matched the suspicious fingerprints, only about 0.5–0.9 % were actually blocked; specifically, those that also triggered the business-logic rules. Requests that matched both criteria were blocked 100 % of the time." (Source: sources/2026-01-15-github-when-protections-outlive-their-purpose)

  • The fingerprint side is broad and cheap — applies to a large request population.
  • The business-logic side is narrow and contextual — applies only within a specific behavioural band.
  • The intersection is the block set.
  • The complement of the intersection inside either side alone is the "observed but not blocked" set — used for detection and retuning, not enforcement.

What the numbers mean

With a composite block shape, three FP rates coexist and mean different things:

  1. FP rate inside the both-matched set — by construction, close to 0 % if the rule is well-tuned; but when the rule ages and the threat evolves, this number rises silently because the business-logic side now overlaps with legitimate traffic patterns.
  2. FP rate inside the fingerprint-matched set — equal to (business-logic-match-rate × residual FP inside that). GitHub disclosed 0.5–0.9 %. This number is not the aggregate FP rate — most traffic doesn't match the fingerprint at all.
  3. FP rate across total traffic — typically 1–3 orders of magnitude smaller than (2). GitHub's disclosed 0.003–0.004 % comes from (total traffic) × (fingerprint match rate) × (business-logic match rate within that) × (FP rate inside the intersection).

The small aggregate number is not evidence that the detector is working correctly — it is evidence that the fingerprint is rare in the overall population. A 100 % block rate inside the intersection can still be catastrophic at GitHub scale if even a few requests per 100 K hit legitimate users on bookmarked URLs.

Where it breaks

  • Threat-pattern drift. The fingerprint was chosen because it correlated with abusive traffic at one point in time. Abuse tooling evolves; the fingerprint's population shifts; legitimate tools update to match the same TLS library or header shape. The composite can stop catching the original threat long before the team notices.
  • Legitimate-client drift. New browsers, new clients, new corporate network appliances introduce new fingerprint populations that overlap with historical abuse fingerprints. The signal starts firing on users who were fine yesterday.
  • Business-logic ossification. The business-logic side encodes abuse behaviour at the time the rule was written. Legitimate usage patterns evolve (new features, new flows, new clients of the public API), and eventually some legitimate flow happens to exercise the same path or sequence.
  • Silent expiration. Unlike a synthetic health probe, no alert fires when a composite rule's precision drops — the system only knows requests matched the rule, not whether the match was right. Detection lands via external feedback (social media, support tickets) long after the rule started misfiring.

Mitigation discipline

  • FP-rate measurement per composite rule, not per fingerprint. Track precision over time, alert on drift.
  • Retuning hooks — the composite must be runtime-tunable so a drifted rule can be relaxed without a full deploy (cf. concepts/false-positive-management).
  • Expiration metadata — each composite rule carries a review date; unreviewed rules flag themselves. Pairs with patterns/expiring-incident-mitigation.
  • Shadow mode for new rules — a new composite starts in "observe but don't block" mode, ships to block only after the observed intersection matches expected abuse traffic.
  • Telemetry of the block decision itself — log which fingerprint + which business-logic rule fired, not just that the request was blocked, so post-hoc audits of stale rules can identify specific retirement candidates. Cross-layer tracing (cf. patterns/cross-layer-block-tracing) is the investigation prerequisite.

Seen in

  • sources/2026-01-15-github-when-protections-outlive-their-purpose — canonical wiki instance. GitHub's disclosed Service-layer protection rules combine industry-standard fingerprinting with GitHub-specific business-logic predicates; the article's charts decompose the resulting FP rates at all three levels (both-matched, fingerprint-matched, total-traffic). The post's remediation frame centres on lifecycle maintenance of composite rules, not the composite design itself — composite-signal design is the precondition that keeps FP rate small; discipline around retiring stale rules is what keeps it that way.
Last updated · 319 distilled / 1,201 read