Skip to content

CONCEPT Cited by 3 sources

Data classification tagging

What it is

Data classification tagging is attaching machine-readable sensitivity metadata to each unit of data — typically at the column / field level — so that automated policy enforcement, detection, and warehouse governance can reason about data category at runtime or query time. The tags encode what kind of data this is (PII, billing, security credential, health, location, free-form user content, etc.) and sometimes what uses are permitted.

Why field-level (not resource-level)

Resource-level classification — "this table holds PII" — forces everything in the table to the highest sensitivity tier. Real schemas mix public fields (display name, avatar URL) with sensitive ones (email, phone, billing token) in the same row. Field-level tagging lets enforcement make per-field decisions: allow the public fields in an API response, flag the sensitive ones.

The central-schema pattern

Canonical production instance (FigTag, Source: sources/2026-04-21-figma-visibility-at-scale-sensitive-data-exposure):

  1. Annotate every column with a category at schema-definition time.
  2. Store annotations in a central schema (not scattered across service-local configs) so a single source of truth exists.
  3. Propagate to the data warehouse so analytics queries can filter / aggregate / audit by sensitivity.
  4. Expose at runtime so application middleware (e.g., an ORM callback) can hook sensitive-column loads for detection or enforcement.

Enforcement leverage

Once every column is tagged, many enforcement/detection systems compose on top of the same metadata:

  • Response-body sampling for sensitive values (Figma Response Sampling Phase 2) — match banned_from_clients-tagged values against outbound JSON.
  • Warehouse access controls — restrict queries over PII tables to approved roles based on tag.
  • Log redaction — scrub any field with a pii tag before shipping to centralized logging.
  • Compliance audits — enumerate all regulated-tagged columns for SOC 2 / GDPR / HIPAA evidence.
  • Test-suite guards — assert that no response object contains a banned_from_clients field on unit tests.

The common thread: the enforcement code changes, the tags stay stable — so the team doesn't need to maintain parallel lists of sensitive columns in every enforcement system.

Categories vs tiers

Two complementary axes seen in production:

  • Categorywhat kind of data (PII / billing / security / health / internal / public). Used for routing to the right enforcement rule.
  • Tierhow sensitive (public / internal / restricted / confidential). Used for blanket-level decisions (logging, retention, export).

A column can carry both: {category: pii.email, tier: restricted}.

Coverage + drift challenges

Not disclosed in the Figma post but universal for this pattern:

  • New columns need a tag — usually enforced by a migration-time lint or a default-deny (untagged = restricted).
  • Tag reviews when intended usage changes — a field once used internally that now ships to clients needs re-tagging.
  • Drift between schema and warehouse — propagation pipeline needs a reconciliation job.

Seen in

  • sources/2026-05-13-databricks-abac-row-filtering-and-column-masking-policies-governed-tagsUnity Catalog Governed Tags + Data Classification GA — third altitude of the data-classification-tagging primitive on the wiki. Where Figma FigTag operates at the application-schema altitude (annotate columns at schema definition, propagate to enforcement) and Meta Policy Zones operates at the runtime-IFC-annotation altitude, Unity Catalog Governed Tags operates at the data-warehouse-catalog altitude — same primitive (machine-readable sensitivity metadata on columns), but the catalog itself is the substrate of record, parent-to-child inheritance handles bulk tagging, and ABAC policies are the consumer (vs. Figma FigTag → response-body sampling, Meta Policy Zones → IFC enforcement). Three architectural properties named in the GA: (1) account-level vocabulary (single shared tag list across all workspaces in an account); (2) inheritance from catalogs to schemas to tables to columns (tagging a catalog tags every descendant); (3) permission separation between tag vocabulary management (MANAGE/CREATE for account admins), tag application (APPLY for stewards), and data ownership (OWNER for producers) — see concepts/separation-of-duties-data-governance. Auto-tagging by agentic LLM + pattern classifiers writes to the same vocabulary humans use, so downstream ABAC enforcement makes no distinction. See concepts/governed-tag for the broader framing distinguishing governed tags from ad-hoc tags.

  • sources/2026-04-21-figma-visibility-at-scale-sensitive-data-exposure — FigTag annotates every column with a sensitivity category; banned_from_clients is the category that drives Figma Response Sampling's Phase 2 detection.

  • sources/2024-08-31-meta-enforces-purpose-limitation-via-privacy-aware-infrastructure — Meta's Policy Zones uses data annotations (e.g. BANANA_DATA) as the IFC-axis sibling of field-level classification tagging: the labels are the same shape (metadata on a column / cell / variable), but Meta uses them to drive runtime information flow control rather than detection / warehouse-governance. Initial identification of what-to-tag uses Meta's ML-based classifier — the automation sibling of Figma's central-schema tagging discipline. The two framings together bracket the wiki's understanding of sensitivity tagging: Figma/FigTag = classification tagging for detection + governance; Meta/PAI = data annotation for runtime IFC enforcement. Same primitive, different consumers; both canonical instances on the wiki.

Last updated · 542 distilled / 1,571 read