Skip to content

SYSTEM Cited by 1 source

DataHub

DataHub (datahubproject.io) is an open-source metadata platform — a data catalog that tracks tables, columns, ownership, lineage, glossary terms, and quality signals across a company's data assets. Originally developed at LinkedIn and now a community-maintained project.

Role in this wiki

DataHub appears in the corpus primarily as the substrate of PinCat — Pinterest's internal data catalog. Pinterest layers tier tags, owners, retention policies, and column-level glossary terms on top of DataHub, and PinCat becomes the system of record that drives the Analytics Agent's schema grounding + governance-aware ranking.

(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)

Core concepts (as used by Pinterest)

  • Table tier tags — Tier 1 / 2 / 3 classification of production quality (concepts/data-governance-tiering).
  • Glossary terms — reusable business concepts (e.g. user_id, pin_id) that unify different column names across tables. DataHub docs: business-glossary.
  • Ownership + retention policies — metadata that gates deprecation decisions and drives AI ranking trust signals.

Seen in

Last updated · 319 distilled / 1,201 read