CONCEPT Cited by 1 source
Data classification tagging¶
What it is¶
Data classification tagging is attaching machine-readable sensitivity metadata to each unit of data — typically at the column / field level — so that automated policy enforcement, detection, and warehouse governance can reason about data category at runtime or query time. The tags encode what kind of data this is (PII, billing, security credential, health, location, free-form user content, etc.) and sometimes what uses are permitted.
Why field-level (not resource-level)¶
Resource-level classification — "this table holds PII" — forces everything in the table to the highest sensitivity tier. Real schemas mix public fields (display name, avatar URL) with sensitive ones (email, phone, billing token) in the same row. Field-level tagging lets enforcement make per-field decisions: allow the public fields in an API response, flag the sensitive ones.
The central-schema pattern¶
Canonical production instance (FigTag, Source: sources/2026-04-21-figma-visibility-at-scale-sensitive-data-exposure):
- Annotate every column with a category at schema-definition time.
- Store annotations in a central schema (not scattered across service-local configs) so a single source of truth exists.
- Propagate to the data warehouse so analytics queries can filter / aggregate / audit by sensitivity.
- Expose at runtime so application middleware (e.g., an ORM callback) can hook sensitive-column loads for detection or enforcement.
Enforcement leverage¶
Once every column is tagged, many enforcement/detection systems compose on top of the same metadata:
- Response-body sampling for sensitive values
(Figma Response Sampling
Phase 2) — match
banned_from_clients-tagged values against outbound JSON. - Warehouse access controls — restrict queries over PII tables to approved roles based on tag.
- Log redaction — scrub any field with a
piitag before shipping to centralized logging. - Compliance audits — enumerate all
regulated-tagged columns for SOC 2 / GDPR / HIPAA evidence. - Test-suite guards — assert that no response object contains
a
banned_from_clientsfield on unit tests.
The common thread: the enforcement code changes, the tags stay stable — so the team doesn't need to maintain parallel lists of sensitive columns in every enforcement system.
Categories vs tiers¶
Two complementary axes seen in production:
- Category — what kind of data (PII / billing / security / health / internal / public). Used for routing to the right enforcement rule.
- Tier — how sensitive (public / internal / restricted / confidential). Used for blanket-level decisions (logging, retention, export).
A column can carry both: {category: pii.email, tier: restricted}.
Coverage + drift challenges¶
Not disclosed in the Figma post but universal for this pattern:
- New columns need a tag — usually enforced by a
migration-time lint or a default-deny (untagged =
restricted). - Tag reviews when intended usage changes — a field once used internally that now ships to clients needs re-tagging.
- Drift between schema and warehouse — propagation pipeline needs a reconciliation job.
Seen in¶
- sources/2026-04-21-figma-visibility-at-scale-sensitive-data-exposure
— FigTag annotates every column with a sensitivity category;
banned_from_clientsis the category that drives Figma Response Sampling's Phase 2 detection.