CONCEPT Cited by 1 source
Glossary term propagation¶
Glossary term propagation is the automated labeling technique of
extending a well-documented column's business glossary term (e.g.
user_id, pin_id, advertiser_id) to undocumented columns it
participates in joins with, or to undocumented columns whose
descriptions are semantically similar. Pinterest uses it to auto-tag
40% of in-scope columns with high-quality glossary terms.
(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)
Two mechanisms¶
1. Join-based lineage propagation¶
Analyze query logs to build a join graph between columns — each edge represents a join observed in real SQL.
Worked example from Pinterest: data.pins_d.id repeatedly joins to
ad.ad_video_event_flat_spark.objectid. If data.pins_d.id is
well-documented with the glossary term pin_id, propagate pin_id
to ad.ad_video_event_flat_spark.objectid.
Works because real analyst queries encode the semantic equivalence of columns that join — if you can join them, they mean the same thing. The join graph is a latent ontology that Pinterest was already paying to build with every production query.
2. Search-based propagation (for sparse-join cases)¶
Where join patterns are sparse or absent (e.g. a new table with no join history yet), fall back to a vector-search approach: index glossary terms + column docs into a vector database, enable semantic similarity search between column descriptions and existing glossary term definitions.
Why it matters architecturally¶
Glossary terms are what make column-level business meaning
portable across tables. Without them, the same business concept
lives under different column names in different tables
(g_advertiser_id in one, adv_id in another) and every downstream
AI system — Text-to-SQL, data search, metric lineage — has to
re-learn the mapping. With glossary terms propagated at scale, a
single upstream curation effort pays off everywhere.
Operational results (Pinterest)¶
- >40% of columns in scope auto-tagged via join-based lineage.
- Combined with AI-generated docs and search-based propagation, total manual documentation work reduced by ~70%.
Seen in¶
- sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql — canonical wiki instance.
Related¶
- systems/pinterest-pincat — where glossary terms live.
- systems/pinterest-ai-table-documentation — the parent documentation-at-scale system.
- systems/datahub — upstream open-source glossary-term concept.
- concepts/data-lineage