SYSTEM Cited by 1 source
PinCat (Pinterest data catalog)¶
PinCat is Pinterest's internal data catalog, built on top of the open-source DataHub project. It is the system of record for:
- Table tier tags — Tier-1 / Tier-2 / Tier-3 classification.
- Table owners and retention policies.
- Column-level semantics via
glossary terms — reusable
business concepts like
user_idorpin_idthat unify different column names across tables (e.g.g_advertiser_idvsadv_id). - Table + column descriptions, much of which is now AI-generated with a human-review ladder.
(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)
Why it matters¶
Pinterest's data warehouse once held "hundreds of thousands of tables, most with no clear owner or documentation." The governance roadmap targeted a reduction from ~400K to ~100K tables via standardization and cleanup. PinCat is the enforcement surface for that program — a table's tier, ownership, and freshness are not advisory metadata but the actual inputs the Analytics Agent's ranker uses to decide what to surface.
Pinterest names this explicitly: "Governance and AI reinforce each other. A disciplined tiering and documentation program made AI assistance viable; the AI systems, in turn, made large-scale governance and documentation tractable."
Role in the Analytics Agent¶
The Analytics Agent consumes PinCat in three places:
- Schema grounding. Generated SQL must reference only tables and columns that PinCat confirms exist — the validation check that distinguishes a plausible query from a hallucinated one.
- Governance metadata for ranking. Tier, ownership, freshness, and documentation completeness feed the governance-aware ranker on top of semantic-similarity scores.
- Glossary terms as semantic bridge. Column-level glossary terms let the SQL-to-text pipeline translate physical column names into business-meaningful vocabulary before the LLM sees them.
Tiering in PinCat¶
- Tier 1 — cross-team, production-quality tables with strict documentation and quality requirements. Human-in-the-loop documentation review.
- Tier 2 — team-owned tables with lighter but still enforced standards. LLM-drafts-human-reviews for documentation.
- Tier 3 — staging / temporary / legacy tables, subject to aggressive retention and deprecation policies.
Seen in¶
- sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql — canonical wiki introduction; PinCat is central to every layer of the Analytics Agent.
Related¶
- systems/datahub — the open-source project PinCat is built on.
- systems/pinterest-ai-table-documentation — populates PinCat at scale.
- concepts/data-governance-tiering
- concepts/glossary-term-propagation