SYSTEM Cited by 2 sources
DataHub¶
DataHub (datahubproject.io) is an open-source metadata platform — a data catalog that tracks tables, columns, ownership, lineage, glossary terms, and quality signals across a company's data assets. Originally developed at LinkedIn and now a community-maintained project.
Role in this wiki¶
DataHub appears in the corpus primarily as the substrate of PinCat — Pinterest's internal data catalog. Pinterest layers tier tags, owners, retention policies, and column-level glossary terms on top of DataHub, and PinCat becomes the system of record that drives the Analytics Agent's schema grounding + governance-aware ranking.
(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)
Core concepts (as used by Pinterest)¶
- Table tier tags — Tier 1 / 2 / 3 classification of production quality (concepts/data-governance-tiering).
- Glossary terms — reusable business concepts (e.g.
user_id,pin_id) that unify different column names across tables. DataHub docs: business-glossary. - Ownership + retention policies — metadata that gates deprecation decisions and drives AI ranking trust signals.
Seen in¶
- sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql — Pinterest's PinCat is built on DataHub.
- sources/2026-05-28-cloudflare-how-we-built-cloudflares-data-platform-and-an-ai-agent-on-top-of-it — DataHub is the metadata catalog of Cloudflare Town Lake. "Every table, column, owner, lineage edge, and glossary term lives there." Skipper queries DataHub for Layer 1 (schema + usage metadata, including historical join-pattern data) and Layer 2 (human-written table descriptions
- glossary +
curatedtags) of its five-layer grounded context. Distinctively, DataHub is also the destination for.meta.jsondocumentation emitted by the Transformer ELT engine on every successful run — making DataHub the substrate of code as context for data agents (Skipper's Layer 3).
Related¶
- systems/pinterest-pincat
- systems/cloudflare-town-lake — second canonical wiki instance: DataHub as the metadata catalog of Town Lake.
- systems/cloudflare-skipper — queries DataHub for Layers 1 + 2 + 3 of its grounded context.
- systems/cloudflare-transformer-elt — emits per-node
.meta.jsonto DataHub on every successful run. - concepts/data-governance-tiering
- concepts/glossary-term-propagation
- concepts/code-as-context-for-data-agents — DataHub as the storage substrate for code-derived semantic context.
- concepts/layered-grounded-context-for-data-agent — Skipper's five-layer model where DataHub powers three of the five.