Skip to content

CONCEPT Cited by 1 source

Glossary term propagation

Glossary term propagation is the automated labeling technique of extending a well-documented column's business glossary term (e.g. user_id, pin_id, advertiser_id) to undocumented columns it participates in joins with, or to undocumented columns whose descriptions are semantically similar. Pinterest uses it to auto-tag

40% of in-scope columns with high-quality glossary terms.

(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)

Two mechanisms

1. Join-based lineage propagation

Analyze query logs to build a join graph between columns — each edge represents a join observed in real SQL.

Worked example from Pinterest: data.pins_d.id repeatedly joins to ad.ad_video_event_flat_spark.objectid. If data.pins_d.id is well-documented with the glossary term pin_id, propagate pin_id to ad.ad_video_event_flat_spark.objectid.

Works because real analyst queries encode the semantic equivalence of columns that join — if you can join them, they mean the same thing. The join graph is a latent ontology that Pinterest was already paying to build with every production query.

2. Search-based propagation (for sparse-join cases)

Where join patterns are sparse or absent (e.g. a new table with no join history yet), fall back to a vector-search approach: index glossary terms + column docs into a vector database, enable semantic similarity search between column descriptions and existing glossary term definitions.

Why it matters architecturally

Glossary terms are what make column-level business meaning portable across tables. Without them, the same business concept lives under different column names in different tables (g_advertiser_id in one, adv_id in another) and every downstream AI system — Text-to-SQL, data search, metric lineage — has to re-learn the mapping. With glossary terms propagated at scale, a single upstream curation effort pays off everywhere.

Operational results (Pinterest)

  • >40% of columns in scope auto-tagged via join-based lineage.
  • Combined with AI-generated docs and search-based propagation, total manual documentation work reduced by ~70%.

Seen in

Last updated · 319 distilled / 1,201 read