Skip to content

PATTERN Cited by 2 sources

Automated discovery + developer reporting

Shape

Build an inventory of a domain-specific concern (cryptographic primitives, ML-model usage, PII access, etc.) by combining two complementary mechanisms with disjoint failure modes:

┌────────────────────────┐         ┌───────────────────────────┐
│ 1. Automated discovery │         │ 2. Developer reporting    │
│                        │         │                           │
│ - Runtime telemetry    │  ┌────▶ │ - Self-report form        │
│ - Monitoring inside    │  │      │ - Architectural            │
│   primary libraries    │  │      │   review assertions       │
│ - Call-site attribution│  │      │ - Questionnaires          │
│                        │  │      │                           │
│ Captures: active usage │  │      │ Captures: edge cases,     │
│  in primary libraries  │  │      │  shadow deps, intent       │
└────────────────────────┘  │      │  for new architectures    │
                            │      └───────────────────────────┘
                    combined inventory
                 migration prioritisation

Neither mechanism alone achieves coverage — they fail in disjoint ways, and running both together closes each other's gaps.

Meta names this pattern explicitly for cryptographic inventory:

This can be built applying two complementary strategies.

  • Automated Discovery: We leverage monitoring tools, such as our Crypto Visibility service, to autonomously map cryptographic primitives used in production. This provides high-fidelity data on active usage within our primary libraries.
  • Reporting: Because monitoring cannot capture every edge case or shadow dependency, we supplement automation with developer reporting. This process captures cryptographic intent for new architectures and uncovers legacy usage in systems outside standard monitoring paths.

(Source: sources/2026-04-16-meta-post-quantum-cryptography-migration-at-meta-framework-lesson)

Why two mechanisms, not one

Automated discovery alone

Strengths:

  • Runtime truth — captures dynamic dispatch, config-driven choices, binary-linked libraries.
  • No-sampling fidelity — every operation counted (see Meta's concepts/cryptographic-monitoring with zero sampling).
  • Quantitative — operation counts drive prioritisation.
  • Scalable — one library-side instrumentation covers the whole fleet.

Failure modes:

  • Third-party / shadow libraries — not linked against the primary instrumented library.
  • Dormant code paths — not yet called, therefore not yet seen.
  • Offline / future usages — CI-time signing, embedded keys, new architectures being designed.
  • No intent — knows that something uses RSA, not why.

Developer reporting alone

Strengths:

  • Edge-case coverage — anything outside the primary libraries.
  • Intent capture — teams explain what and why.
  • Forward-looking — captures upcoming designs before runtime exists.
  • Shadow dependency visibility — third-party, acquired, legacy systems.

Failure modes:

  • Labour cost — teams have to report; compliance varies.
  • Accuracy drift — reports become stale.
  • Incomplete by default — unless enforced with process discipline.
  • No quantitative volume data — can't drive prioritisation-by-usage.

Together

Dimension Automation Reporting Combined
Third-party libs
Dynamic runtime
Future usages
Call-volume data
Intent capture
Labour cost Low High Medium

The combined coverage is what makes the inventory migration-ready.

Implementation playbook

Automation side

  1. Instrument primary libraries. Canonical example: FBCrypto's aggregating buffered logger (the 2024-12-02 Meta ingest).
  2. No sampling. Every operation counted; aggregation by (key, algorithm, method, ...) keeps storage tractable.
  3. Two-tier storage. Warm (Scuba) for interactive analysis; cold (Hive) for trend analysis.
  4. Fleet-level dashboards. Which algorithms? Which call- sites? Which volumes? Slice by service / team / library version.
  5. Freshness metrics. When did we last see usage of X? Dormant → recently-revived is a signal.

Reporting side

  1. Define the reporting surface. What must each team report? When?
  2. Embed in existing workflows. Pull into architecture- review templates, design-doc checklists, new-service intake forms. Not a separate bespoke form.
  3. Structured schema. Free-text reports don't aggregate. Require structured fields (algorithm, primitive class, use case, lifespan expectation, external-dependency flag).
  4. Expiration + re-attestation. Reports older than N months require re-attestation.
  5. Cross-check with automation. Teams that report low-volume-of-X but automation shows high-volume trigger review.

Cross-mechanism reconciliation

The two datasets don't agree perfectly. Reconciliation surfaces important signals:

  • Automation sees what reports don't → inventory blind spots on the reporting side; update the reporting taxonomy.
  • Reports describe what automation doesn't see → shadow dependencies or new architectures; update monitoring coverage.
  • Volumes don't match → reporting is stale, or automation missed something (e.g. a library path not yet instrumented).

Running reconciliation as a regular process (monthly / quarterly) is how the inventory stays true.

Generalisation beyond cryptography

The pattern applies to any domain-specific inventory where a runtime-instrumentable primary surface plus long-tail heterogeneous usage exist:

  • PII access inventory (automation: sampling + data-flow tracking; reporting: team annotation of sensitive columns).
  • ML-model usage inventory (automation: model-serving telemetry; reporting: research-paper uses, offline experiments).
  • Third-party API usage (automation: egress telemetry; reporting: vendor-procurement records).
  • Secret access (automation: KMS audit logs; reporting: justified-secret declarations).

Seen in

Last updated · 319 distilled / 1,201 read