CONCEPT Cited by 1 source
Data lineage¶
Definition¶
Data lineage is the graph of relationships between data assets that tracks "where did this data come from" and "where does this data flow to" — source → sink relationships — across systems. Lineage graphs are built via static code analysis, runtime logging, query parsing, and post-processing.
Uses¶
- Data governance — tracking PII propagation across warehouses and services.
- Pipeline debugging — "why did this dashboard change?"
- Compliance discovery — finding all downstream consumers of a regulated data set.
- Policy rollout — in Meta's 2024-08-31 PAI post, lineage is used as the discovery primitive inside PZM Step 2 — the requirement owner queries lineage to find all sinks downstream of an annotated source, then decides how to remediate each.
Lineage as enforcement: insufficient at scale¶
The 2024-08-31 Meta post is explicit that lineage alone is not a sufficient enforcement primitive:
"The combination of point checking and data lineage, while viable at a small scale, leads to significant operational overhead as point checking still requires auditing many individual assets."
Lineage gives you the graph, but to enforce a purpose-limitation requirement you still need to audit each asset's point-check code to ensure it respects the propagated permission. Meta's conclusion: lineage is necessary for discovery but IFC (Policy Zones) is needed for enforcement. PZM retains lineage inside the tool but delegates enforcement to Policy Zones.
Discovery techniques named at Meta¶
- Static code analysis — e.g. Meta's Zoncolan (Hack static analyser; cited in the PAI post).
- Logging and post-query processing — runtime trace reconstruction of data flows.
- Implicitly: SQL query parsing for batch pipelines (Presto / Spark lineage).
Seen in¶
- sources/2024-08-31-meta-enforces-purpose-limitation-via-privacy-aware-infrastructure — framed as the discovery primitive inside PZM but explicitly rejected as a sufficient enforcement primitive at Meta scale.
Related¶
- concepts/information-flow-control — the enforcement successor.
- concepts/point-checking-controls — the approach lineage was meant to augment.
- concepts/purpose-limitation — the requirement class Meta was trying to enforce via lineage before adopting IFC.
- concepts/data-annotation — the IFC primitive that replaces the point-check-plus-lineage combination.
- systems/meta-policy-zones — the IFC system.
- systems/meta-policy-zone-manager — lineage's UX home at Meta.
- companies/meta