Skip to content

CONCEPT Cited by 2 sources

Cryptographic monitoring

Cryptographic monitoring is the practice of logging every cryptographic operation performed across a fleet — the key name, the algorithm, the method (encrypt/decrypt/sign/verify/key-derive), the library version, and enough identifying context to attribute the event to a call-site — and persisting the data long enough to answer inventory, overuse, and migration-planning questions.

Canonical wiki framing from Meta's 2024-12-02 cryptographic monitoring post, which describes the telemetry architecture of FBCrypto — Meta's unified managed cryptographic library — at Meta scale.

Why it matters

  1. Algorithm deprecation is a moving target. "Cryptography strength decays over time" — weakened primitives must be migrated off. The only tractable way to scope and drive such a migration at hyperscale is a complete inventory of call-sites.
  2. Symmetric keys have a finite data budget. A single key can only protect so many bytes before security margins degrade. concepts/key-overuse-detection depends on accurate per-key cumulative-operation counts.
  3. Post-quantum readiness needs an asymmetric-primitive inventory. Meta ties its PQC migration prioritisation explicitly to the monitoring dataset: "the available data improves our decision-making process while prioritizing quantum-vulnerable use cases."
  4. Emergency migrations become feasible. When a primitive is broken, the fleet inventory identifies every affected call-site instead of a best-effort grep of source code (which misses dynamic dispatch, binary-linked dependencies, and third-party components).
  5. Client-health proxy during library rollouts. Because the dataset is unsampled, anomalous drops in per-host call volume or success rate during a rollout are a clean signal of regressions — detectors can be built directly on the monitoring dataset.

Fundamental cost problem

Cryptographic operations are very high-frequency — Meta discloses that "roughly 0.05 % of CPU cycles at Meta are spent on X25519 key exchange" alone and that hosts "often compute millions of cryptographic operations per day." A naive per-event log through a generic logging framework would consume "an unreasonable amount of write throughput and storage capacity."

Two cost-reduction strategies exist:

  • Sampling (log 1 in X events). Cheap but loses full-population visibility — "most logs being omitted, giving us a less clear picture of the library's usage." Rejected by Meta.
  • Buffering and flushing with aggregation. Accumulate event-counts in-process, flush periodically. Keeps full-population counts at the cost of some delay + aggregation-key cardinality management. Meta's chosen approach.

The choice hinges on whether the aggregation key (event tuple: key name, method, algorithm, …) has low enough cardinality relative to raw event rate to compress meaningfully. For cryptographic monitoring the answer is yes — same key + same method + same algorithm gets called millions of times per host per day, so the per-flush row count is dramatically smaller than the per-operation count.

Architecture — Meta's canonical shape (2024-12-02)

  1. Library-resident buffered logger. Each crypto library (FBCrypto) instance on a host maintains a concurrent hash map keyed on the event tuple, with a count per entry. Increments on every cryptographic operation.
  2. Background flush thread. Periodically serialises the map's contents (as log events with the count field) to Meta's canonical logging framework Scribe and resets the map.
  3. Per-host first-flush jitter (patterns/jittered-flush-for-write-smoothing) smooths cohort-synchronised spikes.
  4. Derived-key aggregation counts KDF-derived child-key operations against the parent keyset to bound row cardinality for features that mint millions of keys.
  5. Two-tier persistence. Scuba for warm interactive analysis; Hive for cold long-retention trend analysis.
  6. Shutdown-flush — synchronous final flush on job exit, predicated on folly::Singleton's lifecycle semantics.
  • Distinct from general observability: observability is about understanding arbitrary system behaviour from logs/metrics/traces; cryptographic monitoring is a narrowly scoped inventory + overuse telemetry with a fixed event-tuple schema and no-sampling fidelity requirement.
  • Distinct from classification / PII telemetry (e.g. Figma's Response Sampling, Meta's PAI / Policy Zones): those telemetries are about data-flow labelling; cryptographic monitoring is about primitive-usage counting.
  • Depends on unified-library leverage to be feasible at hyperscale — if there were N different cryptographic libraries in use across the fleet, instrumenting them all would be prohibitive. Monoculture (FBCrypto) reduces the cost to a single instrumentation.

Seen in

  • sources/2024-12-02-meta-built-large-scale-cryptographic-monitoring — canonical wiki disclosure of cryptographic monitoring as a first-class engineering practice at hyperscale, with the full architectural shape (buffer-and-flush + jittered-flush + derived-key aggregation + two-tier storage) and the four named operational use cases (weak-algorithm migration, emergency migration, key-overuse detection, rollout-health proxy).
  • sources/2026-04-16-meta-post-quantum-cryptography-migration-at-meta-framework-lessonautomated-discovery framing. Meta re-anchors its 2024 cryptographic-monitoring system as the automated-discovery leg of a two-mechanism crypto-inventory approach (the complement being developer reporting for shadow dependencies + new architectures). "We leverage monitoring tools, such as our Crypto Visibility service, to autonomously map cryptographic primitives used in production. This provides high-fidelity data on active usage within our primary libraries. [...] Because monitoring cannot capture every edge case or shadow dependency, we supplement automation with developer reporting." Canonical wiki statement that monitoring alone is necessary but not sufficient for a migration-grade inventory — the downstream consumer (PQC migration) needs both halves of the patterns/automated-discovery-plus-developer-reporting pattern.
Last updated · 319 distilled / 1,201 read