Skip to content

CONCEPT Cited by 1 source

Data classification tagging

What it is

Data classification tagging is attaching machine-readable sensitivity metadata to each unit of data — typically at the column / field level — so that automated policy enforcement, detection, and warehouse governance can reason about data category at runtime or query time. The tags encode what kind of data this is (PII, billing, security credential, health, location, free-form user content, etc.) and sometimes what uses are permitted.

Why field-level (not resource-level)

Resource-level classification — "this table holds PII" — forces everything in the table to the highest sensitivity tier. Real schemas mix public fields (display name, avatar URL) with sensitive ones (email, phone, billing token) in the same row. Field-level tagging lets enforcement make per-field decisions: allow the public fields in an API response, flag the sensitive ones.

The central-schema pattern

Canonical production instance (FigTag, Source: sources/2026-04-21-figma-visibility-at-scale-sensitive-data-exposure):

  1. Annotate every column with a category at schema-definition time.
  2. Store annotations in a central schema (not scattered across service-local configs) so a single source of truth exists.
  3. Propagate to the data warehouse so analytics queries can filter / aggregate / audit by sensitivity.
  4. Expose at runtime so application middleware (e.g., an ORM callback) can hook sensitive-column loads for detection or enforcement.

Enforcement leverage

Once every column is tagged, many enforcement/detection systems compose on top of the same metadata:

  • Response-body sampling for sensitive values (Figma Response Sampling Phase 2) — match banned_from_clients-tagged values against outbound JSON.
  • Warehouse access controls — restrict queries over PII tables to approved roles based on tag.
  • Log redaction — scrub any field with a pii tag before shipping to centralized logging.
  • Compliance audits — enumerate all regulated-tagged columns for SOC 2 / GDPR / HIPAA evidence.
  • Test-suite guards — assert that no response object contains a banned_from_clients field on unit tests.

The common thread: the enforcement code changes, the tags stay stable — so the team doesn't need to maintain parallel lists of sensitive columns in every enforcement system.

Categories vs tiers

Two complementary axes seen in production:

  • Categorywhat kind of data (PII / billing / security / health / internal / public). Used for routing to the right enforcement rule.
  • Tierhow sensitive (public / internal / restricted / confidential). Used for blanket-level decisions (logging, retention, export).

A column can carry both: {category: pii.email, tier: restricted}.

Coverage + drift challenges

Not disclosed in the Figma post but universal for this pattern:

  • New columns need a tag — usually enforced by a migration-time lint or a default-deny (untagged = restricted).
  • Tag reviews when intended usage changes — a field once used internally that now ships to clients needs re-tagging.
  • Drift between schema and warehouse — propagation pipeline needs a reconciliation job.

Seen in

Last updated · 200 distilled / 1,178 read