Skip to content

CONCEPT Cited by 1 source

Key-overuse detection

Key-overuse detection is the operational practice of observing a cryptographic key's cumulative usage (bytes protected, operations performed, nonces issued) against its data-volume budget — the limit beyond which security margins degrade or nonce-reuse probability becomes non-negligible — and rotating the key proactively before the budget is exhausted.

Canonical wiki framing from Meta's 2024-12-02 cryptographic monitoring post: "Since there is a limit to how much data a symmetric cryptographic key can protect, logging allows us to detect key overuse and rotate keys proactively."

Why keys have finite budgets

Symmetric cryptographic primitives have formally analysed security bounds expressed in terms of total data protected or total operations performed under a single key:

  • Nonce-based AEAD modes (AES-GCM, AES-GCM-SIV) have nonce-space size limits; probability of nonce collision grows with operation count.
  • Block-cipher modes have birthday-bound distinguishers that degrade after ~2^(n/2) blocks (for block size n).
  • Key-derivation chains accumulate rekeying risk as the same master key derives more sub-keys.

Concrete example from the AES-GCM-SIV literature: the nonce-misuse- resistant design explicitly tolerates more nonce reuse than AES-GCM, but still has finite data-budget bounds.

The practical implication: every symmetric key needs a rotation plan, and the rotation can be scheduled either on a timer (conservative, usage-independent) or on a usage trigger (adaptive, requires telemetry). The usage-trigger variant needs cumulative counts.

Operational shape

  1. Telemetrymonitor every operation at each call-site, via a aggregating buffered logger inside the crypto library so the per-operation overhead stays negligible.
  2. Aggregation keys include the key name — so cumulative counts can be attributed to the specific key. (Derived keys aggregate against the parent keyset name via concepts/derived-key-aggregation — pessimistic for overuse detection, which is the safe direction for alarms.)
  3. Thresholds — set an operations-per-key threshold strictly below the formal data-volume bound, with operational margin.
  4. Rotation trigger — when cumulative count for a key crosses threshold, schedule rotation: mint a successor key, migrate callers, retire the old key.
  5. Long retention window — rotation is often time-horizon- sensitive, so cumulative counts need to be queryable over months. Meta's two-tier Scuba (warm) + Hive (cold) storage is the shape that satisfies this.

Why it's a hyperscale problem

At small scale, key rotation is typically timer-based (e.g. rotate every 90 days) with comfortable margin relative to usage bounds. At hyperscale:

  • The number of keys in use is too large to rotate all conservatively on a short timer (rotation itself has cost + coordination overhead).
  • The usage distribution across keys is long-tailed; a uniform timer-based rotation wastes effort on cold keys and leaves hot keys close to their bounds.
  • Usage-trigger-based rotation is dramatically more efficient but requires the unsampled monitoring dataset to see actual per-key cumulative counts.

This is one of the two canonical justifications Meta gives for full-fidelity (no-sampling) cryptographic monitoring — the other being migration-scoping for deprecated + PQC-vulnerable primitives.

Seen in

Last updated · 319 distilled / 1,201 read