SYSTEM Cited by 1 source
Pinterest AI Table Documentation¶
AI Table Documentation is Pinterest's system for generating table and column descriptions at warehouse scale using LLMs. It is the production answer to "we have 100,000+ analytical tables and manual documentation can't keep up."
(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)
Input signals¶
The LLM draws on four signals per table / column:
- Data lineage — upstream and downstream tables and their existing documentation.
- Existing PinCat docs, if present.
- Column-level glossary terms (reusable business concepts like
user_id,pin_id). - Representative example queries from QueryBook — the queries analysts actually run against this table.
Human-in-the-loop ladder¶
Review discipline is calibrated by tier:
- Tier 1 (cross-team production) — humans in the loop; LLM drafts reviewed before PinCat publication.
- Tier 2 (team-owned) — flipped ratio: LLMs draft, humans review on a sampling / exception basis.
- All AI-generated docs are clearly marked as such in PinCat, and owners are notified to review + edit over time.
Complementary: glossary term propagation¶
AI Table Documentation is paired with two automated column-semantics propagators:
- Join-based lineage propagation. Pinterest analyzed query logs
to build a join graph between columns (e.g.
data.pins_d.idjoins toad.ad_video_event_flat_spark.objectid). When a well-documented column with a glossary term repeatedly joins to an undocumented column, the term propagates. Auto-tagged >40% of columns in scope. (concepts/glossary-term-propagation) - Search-based propagation. Where join patterns are sparse, index glossary terms + column docs into the Vector DB and use semantic similarity search between column descriptions and existing glossary term definitions.
Operational results¶
- ~40% reduction in manual documentation effort from AI-generated table descriptions alone.
- >75% of AI-generated descriptions rated "usable" or better in user surveys.
- ~70% overall manual-documentation-work reduction when AI + join-based propagation + search-based propagation are combined.
Why it matters architecturally¶
AI Table Documentation closes a critical loop: the Analytics Agent's SQL-to-text step needs rich per-column business context to translate queries into meaningful natural-language descriptions. Without AI documentation, the context-injection step would see only raw column names and miss the business meaning. "Governance and AI reinforce each other."
Seen in¶
- sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql — canonical wiki introduction.