Skip to content

PATTERN Cited by 1 source

Multi-signal pairwise similarity

Problem

Downstream reranking algorithms — DPP, SSD, MMR, slate-level classifiers — all reduce to "how similar are items i and j?" The quality of this similarity bounds the reranking quality. Single-signal approaches have blind spots:

  • Visual embeddings only — catches style/appearance redundancy but misses topic or category overlap between visually-different items.
  • Text embeddings only — catches title/description overlap but misses visual clones of the same image with different metadata.
  • Graph embeddings only — catches co-engagement patterns but can be coarse for fine-grained content distinctions and doesn't apply to new items.
  • Categorical taxonomy only — stable but coarse; two Pins in the same taxonomy class can still be visually and semantically distinct.

Moreover, continuous embeddings capture closeness well but don't always provide a stable, category-like notion of semantics — two items can be close in embedding space in ways that aren't meaningful for diversity control.

Solution

Compose multiple complementary similarity signals into a unified pairwise similarity substrate that the reranking algorithm consumes. Canonical signal families:

  • Visual embeddings (e.g. PinCLIP — multimodal image-text-aligned) — catches visual redundancy and style similarity.
  • Text embeddings — captures overlap in titles, descriptions, and free-form metadata.
  • Graph embeddings (e.g. GraphSage) — captures co-engagement patterns and neighbourhood similarity in the content graph.
  • Semantic ID — hierarchical discrete codes enabling prefix-overlap as a stable category-like similarity signal.

The reranking algorithm's similarity kernel combines these — via weighted sum, concatenation followed by learned projection, or separate penalty terms in the utility equation.

Canonical instance — Pinterest Home Feed Blender evolution

Pinterest's Home Feed Blender evolved its similarity substrate in lockstep with the DPP→SSD algorithm migration (Source: sources/2026-04-07-pinterest-evolution-of-multi-objective-optimization-at-pinterest-home):

  • DPP era (2021) — GraphSage + categorical taxonomy. Narrow signal set because DPP's kernel cost bounded complexity.
  • Early 2025 (SSD) — added visual embeddings + text embeddings to the SSD similarity matrix. Enabled by SSD's lower serving cost.
  • Q3 2025 — visual embedding upgraded to PinCLIP (multimodal + graph-aware + near-real-time for new Pins).
  • Q4 2025 — added Semantic ID as a prefix-overlap signal penalty for stable category-like anti-clustering.

Each generation added a complementary signal without replacing previous ones. Net effect: a similarity substrate combining multimodal embedding similarity + graph similarity + discrete hierarchical ID overlap in one reranking step.

Why multiple signals vs a single unified embedding

In principle one could train a single unified embedding that captures everything. In practice, different signals have different operational properties that argue for keeping them separate:

  • Freshness — PinCLIP-style multimodal embeddings have different retraining cadence than GraphSage (which needs graph updates) which has different cadence than Semantic ID (codebook re-quantisation).
  • Availability for new content — PinCLIP's near-real-time property matters for recently-ingested Pins; graph embeddings need co-engagement history.
  • Stability — Semantic ID prefixes are stable IDs; embedding similarities drift with retraining.
  • Interpretability — categorical axes are explainable; learned embeddings aren't.
  • Cost profile — different signals have different serving costs; composition lets you skip expensive signals when coarser ones suffice.

Composition strategies

  • Weighted sum into a single scalar similarity. Simple; loses signal structure.
  • Concatenated signal vector with a learned projection. More expressive but adds training complexity.
  • Separate penalty terms in the utility equation. Keeps signals independent; composable with soft-spacing. Pinterest's approach for Semantic ID (separate prefix-overlap penalty term).
  • Hierarchical gating — use coarse signal to filter candidates, fine signals to rank. Computationally efficient but loses joint optimisation.

Caveats

  • Signal quality is additive only if signals are complementary — two overlapping signals dilute rather than enhance.
  • Weight tuning is non-trivial — per-signal λ values need calibration; more signals = larger hyperparameter space.
  • Stale signals poison the mix — if one signal is poorly maintained, it degrades the whole substrate. Per-signal monitoring required.
  • Cost composability — computing multi-signal similarities for every pair in the slate can be expensive; precomputation and caching strategies matter.
  • Signal drift — over time, one signal may drift (classifier retrain, embedding version bump); the mix's behaviour changes silently. Monitoring per-signal penalty distribution is necessary.

Seen in

Last updated · 319 distilled / 1,201 read