SYSTEM Cited by 1 source
DataHub¶
DataHub (datahubproject.io) is an open-source metadata platform — a data catalog that tracks tables, columns, ownership, lineage, glossary terms, and quality signals across a company's data assets. Originally developed at LinkedIn and now a community-maintained project.
Role in this wiki¶
DataHub appears in the corpus primarily as the substrate of PinCat — Pinterest's internal data catalog. Pinterest layers tier tags, owners, retention policies, and column-level glossary terms on top of DataHub, and PinCat becomes the system of record that drives the Analytics Agent's schema grounding + governance-aware ranking.
(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)
Core concepts (as used by Pinterest)¶
- Table tier tags — Tier 1 / 2 / 3 classification of production quality (concepts/data-governance-tiering).
- Glossary terms — reusable business concepts (e.g.
user_id,pin_id) that unify different column names across tables. DataHub docs: business-glossary. - Ownership + retention policies — metadata that gates deprecation decisions and drives AI ranking trust signals.
Seen in¶
- sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql — Pinterest's PinCat is built on DataHub.