CONCEPT Cited by 2 sources

Data lineage¶

Definition¶

Data lineage is the graph of relationships between data assets that tracks "where did this data come from" and "where does this data flow to" — source → sink relationships — across systems. Lineage graphs are built via static code analysis, runtime logging, query parsing, and post-processing.

Uses¶

Data governance — tracking PII propagation across warehouses and services.
Pipeline debugging — "why did this dashboard change?"
Compliance discovery — finding all downstream consumers of a regulated data set.
Policy rollout — in Meta's 2024-08-31 PAI post, lineage is used as the discovery primitive inside PZM Step 2 — the requirement owner queries lineage to find all sinks downstream of an annotated source, then decides how to remediate each.

Lineage as enforcement: insufficient at scale¶

The 2024-08-31 Meta post is explicit that lineage alone is not a sufficient enforcement primitive:

"The combination of point checking and data lineage, while viable at a small scale, leads to significant operational overhead as point checking still requires auditing many individual assets."

Lineage gives you the graph, but to enforce a purpose-limitation requirement you still need to audit each asset's point-check code to ensure it respects the propagated permission. Meta's conclusion: lineage is necessary for discovery but IFC (Policy Zones) is needed for enforcement. PZM retains lineage inside the tool but delegates enforcement to Policy Zones.

Discovery techniques named at Meta¶

Static code analysis — e.g. Meta's Zoncolan (Hack static analyser; cited in the PAI post).
Logging and post-query processing — runtime trace reconstruction of data flows.
Implicitly: SQL query parsing for batch pipelines (Presto / Spark lineage).

Seen in¶

sources/2024-08-31-meta-enforces-purpose-limitation-via-privacy-aware-infrastructure — framed as the discovery primitive inside PZM but explicitly rejected as a sufficient enforcement primitive at Meta scale.
sources/2025-10-28-redpanda-governed-autonomy-the-path-to-enterprise-agentic-ai — 2025-10-28 ADP announcement conflates lineage with audit trail as "unified audit and lineage envelope" at the agent-interaction altitude. The lineage axis answers "what data flowed into this agent decision?" (retrieved context + tool-call outputs) while the audit axis answers "who did what, when, with what result?"; the ADP positions the streaming-log substrate as the joint source for both view-shapes. See patterns/durable-event-log-as-agent-audit-envelope. Post does not unpack lineage mechanism — "complete lineage" is a property claim, not a traced graph.
— lineage as a side-effect capability of knowledge-graph-based MDM data-model-definition. Because Zalando records every source column → Concept / Attribute / Relationship mapping as graph edges, lineage from golden-record field back to every contributing source column across every source system falls out for free: "this enables us to keep a record of data lineage from each system to the golden record." Zalando's framing sits at a complementary altitude to the Meta / Redpanda instances above — lineage as a design-time byproduct of the data-modeling substrate, not a runtime-tracing or governance-primitive story.
sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph — Netflix MDS as a lineage system built **from change events
source-of-truth hydration rather than static analysis or query parsing. Six ML source systems emit thin notification-of-change events over Kafka + SNS / SQS; MDS hydrates each entity's full state from the source API and walks foreign-key references. Async enrichment jobs derive multi-hop transitive lineage edges (e.g. Model Instance → Pipeline Run → Dataset collapsed into a direct Model Instance ↔ Dataset materialized edge). Distinct from Meta's Zoncolan static-analysis approach (lineage from code) and from CDC-trace approaches (lineage from data plane); this is lineage from app-emitted change events with API callback hydration**.

concepts/information-flow-control — the enforcement successor.
concepts/point-checking-controls — the approach lineage was meant to augment.
concepts/purpose-limitation — the requirement class Meta was trying to enforce via lineage before adopting IFC.
concepts/data-annotation — the IFC primitive that replaces the point-check-plus-lineage combination.
systems/meta-policy-zones — the IFC system.
systems/meta-policy-zone-manager — lineage's UX home at Meta.
companies/meta
concepts/knowledge-graph — substrate that makes lineage a design-time byproduct in Zalando MDM.
concepts/master-data-management — the problem domain in the Zalando instance.
systems/zalando-mdm-system — Zalando MDM canonical wiki instance.
patterns/knowledge-graph-for-mdm-modeling — the pattern that delivers lineage as a side effect.