Skip to content

PATTERN Cited by 1 source

Governance-tier ranking fusion

Governance-tier ranking fusion is the implementation shape for governance-aware ranking: when ranking retrieval candidates (tables, queries, dashboards), fuse the semantic-similarity score with governance-metadata signals — table tier, data freshness, documentation completeness, ownership activity, historical query success rate — rather than ranking by similarity alone.

(Source: sources/2026-03-06-pinterest-unified-context-intent-embeddings-for-scalable-text-to-sql.)

The pattern

     retrieval candidates (top-k by vector similarity)
      ┌───────────────────────────────────┐
      │ for each candidate, compute       │
      │  a ranking score from:            │
      │   - semantic similarity           │
      │   - table tier (1 / 2 / 3)        │
      │   - data freshness                │
      │   - documentation completeness    │
      │   - ownership status              │
      │   - query success rate            │
      │   - usage recency + volume        │
      │   - author expertise              │
      └───────────────────────────────────┘
              ranked trustworthy candidates

Why fusion beats similarity-only ranking

A 100K-table warehouse is full of semantically-similar candidates at different trust levels. Without fusion:

  • Staging tables outrank production tables when staging's schema is "closer" to the user's phrasing.
  • Deprecated-but-searchable tables keep appearing even as the company migrates off them.
  • Undocumented tables with better column-name overlap outrank well-documented Tier-1 alternatives.

Pinterest's framing: "A Tier-1 table with active ownership and fresh data ranks higher than a semantically similar but deprecated or undocumented alternative."

Signals used in the fusion

Pinterest groups them into two families:

Governance metadata

  • Table tierTier 1 / Tier 2 / Tier 3.
  • Data freshness — how recent is the last partition.
  • Documentation completeness — populated description + glossary terms + owner notes.
  • Ownership status — active vs. orphaned.

Statistical signals (from query execution history)

  • Table co-occurrence frequency — how often tables are queried together.
  • Query success rates — successful patterns weighted higher than failed attempts.
  • Usage recency and volume — recent, frequently-used patterns reflect current best practices.
  • Author expertise — queries from experienced analysts in specific domains carry higher weight.

Prerequisites

  • Tier metadata must exist and be current — requires a governance catalog like PinCat.
  • Freshness + ownership + doc-completeness metadata must be machine-readable — not buried in wiki pages.
  • Query execution logs must be collected and analyzable — for the statistical signals.

Without these inputs the pattern degenerates to similarity-only ranking.

Seen in

Last updated · 319 distilled / 1,201 read