Skip to content

PATTERN Cited by 1 source

Data-driven allowlist via monitoring mode

Shipping a default-deny enforcement system (allowlist-based access control, binary authorization, egress firewall, API-allowlist gateway) without first observing the actual distribution of traffic guarantees the policy will over-block on day one and freeze productivity while the team scrambles to build rules.

The pattern: deploy the enforcer in a passive observing mode first, log every event it would have blocked, analyse the distribution, build the allowlist from real data, then switch the enforcer to blocking mode.

Shape

  1. Deploy in monitoring mode — enforcer runs on every target host but emits only events + telemetry. Nothing blocked. Users notice nothing.
  2. Collect long enough to cover rare patterns — at minimum one full cycle of whatever the slowest-moving workload is (end-of-month billing cron, quarterly compliance tool run, once-a-year tax software). Longer is safer.
  3. Classify observed events into categories — e.g., "signed by known vendor" / "signed by unknown vendor" / "unsigned from package manager" / "unsigned locally built". Each category gets a different rule-generation strategy; don't lump them.
  4. Build the initial allowlist from the high-frequency tail first — rules that cover the most-executed events return the most coverage per rule. Follow with long-tail rules for less frequent items.
  5. Iterate while still in monitoring mode — after adding rules, check monitoring output again: events that would still be blocked are the remaining work. Residual categories need different mitigations (self-service approval, group-scoped rules, Compiler rules, Package rules, etc.).
  6. Switch to enforcement in percentage cohorts, not fleet-wide — residual blocks will surface in the first cohort and can be fixed before the next cohort sees them.

Why this works

  • Real distribution is always weirder than imagined. Actual fleet execution reveals apps the security team didn't know about, signing identities they didn't expect, and temporal patterns (the monthly cron) that synthetic enumeration misses.
  • Data-driven severity ranking — rule-creation effort gets pointed at the events that actually happen, not the events the team worried about in a threat-modeling session.
  • Enforcement-mode surprise stays small because the high-impact allow rules were built from real data before anyone got blocked.

Residual-category typology (from the binary-auth canonical instance)

When monitoring-mode data stops shrinking, the remaining events usually split into:

  • Signed by known vendor, no rule yet → proactive review; decide global-allow / global-block / personal-rule-per-user.
  • Unsigned locally-built → Compiler rules (auto-allow things specified compilers produced).
  • Unsigned from package managers → automated Package Rule system (patterns/package-rule-auto-generation) keeps SHA-256 rules current against upstream upgrades.
  • Specialist workflows producing per-machine unique hashes (e.g., Anaconda + codesign --sign -) → group-scoped permissive rules, not fleet-wide ones. Deferred to the last cohort.

Seen in

  • sources/2026-04-21-figma-rolling-out-santa-without-freezing-productivity — Figma deployed systems/santa in monitoring mode across the full fleet before switching any machine to lockdown. Monitoring data drove the initial SigningID + TeamID-dominated allowlist, covering the majority of binary executions. Residual-UNKNOWN analysis produced the three-category typology above with per-category mitigations. Proactive review of developer-signed apps on >3 devices classified them into global allow / global block / self-service approve. Canonical wiki instance.

Sibling patterns

  • patterns/shadow-application-readiness — Figma's Databases team runs the DBProxy logical planner against live production queries (captured to Snowflake) to pick the sharded-query subset covering 90% of queries without worst-case complexity. Exactly the same data-driven-scope-selection pattern in a different domain.
  • patterns/side-by-side-runtime-validation — runs a new runtime in parallel with the old one, flags divergences for investigation, flips when the divergence rate is zero. Has the same "observe first, switch later" skeleton.
  • patterns/shadow-migration — dual-write during migration with the new path in shadow mode until the divergence rate is acceptably low.
Last updated · 200 distilled / 1,178 read