Skip to content

PATTERN Cited by 1 source

Separate annotation from requirement

Pattern

In an IFC / policy-based system where data carries labels, keep the annotation schema and the per-requirement flow rules as two separate, independently-evolving layers:

  • A data annotation is a simple label on a data asset (e.g. BANANA_DATA) — orthogonal to any particular privacy or policy requirement.
  • A requirement (e.g. "banana data can only be used for smoothies and fruit baskets, not for banana bread") is expressed as a set of flow rules referring to that label.

Multiple requirements can independently reference the same annotation, and new requirements can be added without rewriting existing annotations.

Motivation — Meta's antipattern

Meta's 2024-08-31 PAI post names this as the fourth "lesson learned" — the direct consequence of getting it wrong:

"Initially, we employed a monolithic annotation API to model intricate data flow rules and annotate relevant code and data. However, as data from multiple requirements were combined, propagating these annotations from sources to sinks became increasingly complex, resulting in data annotation conflicts that were difficult to resolve. To address this challenge, we implemented simplified data annotations to decouple data from requirements and separate data flow rules for different requirements. This significantly streamlined the annotation process, ultimately improving developer experiences."

The failure mode of the monolithic API was annotation conflicts: when data labelled under requirement A and data labelled under requirement B flowed into the same function, the combined annotation needed to carry both requirements' full schemas — and the schemas could disagree on legal flows, forcing ad-hoc conflict resolution. The fix collapses the label down to a simple identifier and keeps the requirement semantics in a separate rule repository.

Mechanics

  • Annotation layer: labels only. (BANANA_DATA), (USER_EMAIL). Easy to propagate by composition (multiple labels can coexist on one datum).
  • Requirement layer: flow rules reference labels. "If a flow source has BANANA_DATA, the sink must have BANANA_DATA or the flow must be marked reclassified."
  • Multiple requirements on one label: requirement R1 and requirement R2 may both impose constraints on BANANA_DATA; each is evaluated independently by the runtime; the flow is permitted only if all applicable requirements pass.

Analogue in other domains

  • Kubernetes labels vs selectors — labels on pods are requirement-agnostic; selectors in various resource kinds express the per-requirement rule.
  • Data classification tagging vs policy engines — see concepts/data-classification-tagging (Figma); tags are label-only, policy is expressed separately.

Trade-offs

  • Simpler annotation schema → less ambiguity on write, fewer conflicts to resolve.
  • Rule composition cost at runtime: every flow now goes through N requirement evaluations instead of one. Meta's runtime is apparently cheap enough; see lattice evaluation performance engineering.
  • Requirement discoverability: with rules in a separate repository, developers have to look two places to understand what constraints apply to a label. PZM-style UX is necessary.

Seen in

Last updated · 319 distilled / 1,201 read