Skip to content

SYSTEM Cited by 1 source

PinCat (Pinterest data catalog)

PinCat is Pinterest's internal data catalog, built on top of the open-source DataHub project. It is the system of record for:

  • Table tier tagsTier-1 / Tier-2 / Tier-3 classification.
  • Table owners and retention policies.
  • Column-level semantics via glossary terms — reusable business concepts like user_id or pin_id that unify different column names across tables (e.g. g_advertiser_id vs adv_id).
  • Table + column descriptions, much of which is now AI-generated with a human-review ladder.

(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)

Why it matters

Pinterest's data warehouse once held "hundreds of thousands of tables, most with no clear owner or documentation." The governance roadmap targeted a reduction from ~400K to ~100K tables via standardization and cleanup. PinCat is the enforcement surface for that program — a table's tier, ownership, and freshness are not advisory metadata but the actual inputs the Analytics Agent's ranker uses to decide what to surface.

Pinterest names this explicitly: "Governance and AI reinforce each other. A disciplined tiering and documentation program made AI assistance viable; the AI systems, in turn, made large-scale governance and documentation tractable."

Role in the Analytics Agent

The Analytics Agent consumes PinCat in three places:

  1. Schema grounding. Generated SQL must reference only tables and columns that PinCat confirms exist — the validation check that distinguishes a plausible query from a hallucinated one.
  2. Governance metadata for ranking. Tier, ownership, freshness, and documentation completeness feed the governance-aware ranker on top of semantic-similarity scores.
  3. Glossary terms as semantic bridge. Column-level glossary terms let the SQL-to-text pipeline translate physical column names into business-meaningful vocabulary before the LLM sees them.

Tiering in PinCat

  • Tier 1 — cross-team, production-quality tables with strict documentation and quality requirements. Human-in-the-loop documentation review.
  • Tier 2 — team-owned tables with lighter but still enforced standards. LLM-drafts-human-reviews for documentation.
  • Tier 3 — staging / temporary / legacy tables, subject to aggressive retention and deprecation policies.

Seen in

Last updated · 319 distilled / 1,201 read