Skip to content

SYSTEM Cited by 1 source

Pinterest AI Table Documentation

AI Table Documentation is Pinterest's system for generating table and column descriptions at warehouse scale using LLMs. It is the production answer to "we have 100,000+ analytical tables and manual documentation can't keep up."

(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)

Input signals

The LLM draws on four signals per table / column:

  • Data lineage — upstream and downstream tables and their existing documentation.
  • Existing PinCat docs, if present.
  • Column-level glossary terms (reusable business concepts like user_id, pin_id).
  • Representative example queries from QueryBook — the queries analysts actually run against this table.

Human-in-the-loop ladder

Review discipline is calibrated by tier:

  • Tier 1 (cross-team production) — humans in the loop; LLM drafts reviewed before PinCat publication.
  • Tier 2 (team-owned) — flipped ratio: LLMs draft, humans review on a sampling / exception basis.
  • All AI-generated docs are clearly marked as such in PinCat, and owners are notified to review + edit over time.

Complementary: glossary term propagation

AI Table Documentation is paired with two automated column-semantics propagators:

  1. Join-based lineage propagation. Pinterest analyzed query logs to build a join graph between columns (e.g. data.pins_d.id joins to ad.ad_video_event_flat_spark.objectid). When a well-documented column with a glossary term repeatedly joins to an undocumented column, the term propagates. Auto-tagged >40% of columns in scope. (concepts/glossary-term-propagation)
  2. Search-based propagation. Where join patterns are sparse, index glossary terms + column docs into the Vector DB and use semantic similarity search between column descriptions and existing glossary term definitions.

Operational results

  • ~40% reduction in manual documentation effort from AI-generated table descriptions alone.
  • >75% of AI-generated descriptions rated "usable" or better in user surveys.
  • ~70% overall manual-documentation-work reduction when AI + join-based propagation + search-based propagation are combined.

Why it matters architecturally

AI Table Documentation closes a critical loop: the Analytics Agent's SQL-to-text step needs rich per-column business context to translate queries into meaningful natural-language descriptions. Without AI documentation, the context-injection step would see only raw column names and miss the business meaning. "Governance and AI reinforce each other."

Seen in

Last updated · 319 distilled / 1,201 read