Skip to content

PATTERN Cited by 1 source

Column-level classification propagation

Description

Automatically propagate data classification tags from input columns to output columns as data flows through ML pipeline modules. This ensures that governance metadata follows the data through transformations, reducing manual effort and preventing accidental exposure of sensitive data in derived datasets.

Mechanics

  1. Input tables have per-column classification tags (e.g., PII, confidential, public)
  2. As a module transforms data, the platform automatically propagates source classification to output columns
  3. Columns without explicit classification default to the most restrictive level (conservative safety)
  4. Only appropriately-classified data can be published to interactive catalogs or downstream consumers

Trade-offs

Pro Con
Eliminates manual re-classification at each stage May over-classify derived columns (e.g., aggregates of PII)
Prevents accidental data exposure through derived datasets Requires a classification taxonomy that composes under transformations
Default-deny for unclassified is safe-by-default Can block legitimate uses if propagation is too conservative

Seen In

(Source: sources/2026-06-10-atlassian-architecting-scalable-ml-platforms)

Last updated · 542 distilled / 1,571 read