Skip to content

PATTERN Cited by 1 source

Feature allowlist over blocklist

Problem

When a producer is trimming a payload down to what a consumer actually uses, it can express the decision two ways:

  1. Allowlist: keep only fields in an approved list; drop everything else.
  2. Blocklist: drop only fields in a blocked list; keep everything else.

Both can be made to produce the same result, but their maintenance profiles diverge sharply when the underlying field universe is evolving. For ML feature sets — where features are added for experiments, deprecated between model versions, or renamed — the two approaches have very different ops cost.

Pinterest's articulation

From the 2026-05-01 Feature Trimmer post (Source: sources/2026-05-01-pinterest-optimizing-ml-workload-network-efficiency-part-i-feature-trimmer):

"This allowlist approach, compared to a blocklist where we keep features not in the list, does not carry the burden of tracking all the features that might be in development or deprecated. Given the evolving nature of ML models and volume of experiments at Pinterest, the blocklist is significantly larger for any given model and it is probable that it will grow faster than the allowlist in the future."

Three observations compounded:

  1. The allowlist is defined by what a model trained on. It's a finite, knowable set — bounded by the model's input signature.
  2. The blocklist has to track the entire universe of features minus the allowlist, including experimental features in development and features being deprecated but still present in the feature store.
  3. The ML feature universe grows monotonically as the platform adds features for new experiments; the blocklist grows faster than the allowlist.

Solution

Use the allowlist, sourced from the model's own declaration (concepts/model-signature-as-source-of-truth). For Pinterest this is the input_names array in module_info.json inside the .pt archive.

On the producer side: given a request for model X version Y, look up allowlist[X][Y] → keep only features in that set → drop the rest → serialise and ship.

When it fits

  • The field universe evolves faster than the approved set. ML features at an experimentation-heavy platform; API response fields during GraphQL migrations; log fields during PII audits.
  • Correctness is defined by what the consumer needs, not by what the producer wants to expose. The consumer declares; the producer filters.
  • A contract artefact is available that enumerates the consumer's requirements. If you have a consumer-declared contract (Protobuf schema, GraphQL query, ML model signature), the allowlist writes itself.
  • Unknown / new fields should be excluded by default rather than included by default. Allowlist is fail-closed on new fields; blocklist is fail-open.

When it doesn't fit

  • The consumer has a well-known, stable rejection set: e.g. "strip PII fields" — allowlist would have to enumerate all kept fields; blocklist enumerates the small stable rejection set.
  • The producer is authoritative about what's emitted (not the consumer) — logging systems, telemetry producers shipping to heterogeneous consumers.
  • Field universe is closed and small — a dozen-field record where you're comfortable maintaining blocklist parity.

Failure modes

  • Allowlist staleness — if the allowlist isn't refreshed when the consumer's contract changes, legitimate new fields get dropped. Mitigated by riding the consumer's deploy pipeline + skip-on-miss fallback.
  • Allowlist bloat — if the allowlist gets spliced with optional fields, it creeps toward the Send-Everything end. Mitigated by treating the allowlist as exactly the consumer's declared inputs.
  • Blocklist-parity divergence — at any point when both mechanisms coexist (migration windows), drift between them silently mis-behaves. Mitigated by picking one mechanism per system.

Sibling patterns on the wiki

The common thread: when the universe of possibilities is larger than what you know, allowlist is the safe default.

Seen in

Last updated · 445 distilled / 1,275 read