PATTERN Cited by 1 source
Column-level classification propagation¶
Description¶
Automatically propagate data classification tags from input columns to output columns as data flows through ML pipeline modules. This ensures that governance metadata follows the data through transformations, reducing manual effort and preventing accidental exposure of sensitive data in derived datasets.
Mechanics¶
- Input tables have per-column classification tags (e.g., PII, confidential, public)
- As a module transforms data, the platform automatically propagates source classification to output columns
- Columns without explicit classification default to the most restrictive level (conservative safety)
- Only appropriately-classified data can be published to interactive catalogs or downstream consumers
Trade-offs¶
| Pro | Con |
|---|---|
| Eliminates manual re-classification at each stage | May over-classify derived columns (e.g., aggregates of PII) |
| Prevents accidental data exposure through derived datasets | Requires a classification taxonomy that composes under transformations |
| Default-deny for unclassified is safe-by-default | Can block legitimate uses if propagation is too conservative |
Seen In¶
- sources/2026-06-10-atlassian-architecting-scalable-ml-platforms — ML Studio: automatic tag propagation from input to output tables via platform utilities; unclassified columns treated conservatively by default; 900k+ datasets generated with built-in access control
(Source: sources/2026-06-10-atlassian-architecting-scalable-ml-platforms)