Skip to content

PATTERN Cited by 1 source

Batch embedding for index consistency

Intent

When building an ANN index for a two-tower retrieval / ranking system, favor batch inference over a single consistent model checkpoint over incremental streaming updates that mix embedding versions. Each rebuilt index is version-homogeneous; embedding version mismatches live only across sequential index builds, not within them.

Pinterest's 2026-02-27 L1 CVR retrospective (sources/2026-02-27-pinterest-bridging-the-gap-online-offline-discrepancy-l1-cvr) documents this as their deployment posture for large tiers, specifically as a mitigation for embedding version skew.

The structural choice

Two ways to populate an ANN index with item embeddings from a two-tower model:

Option A — Streaming / incremental enrichment (what causes skew)

  • Each item change triggers a real-time re-embedding by the current item-tower checkpoint.
  • Embeddings are written to the index as they arrive.
  • The index accumulates embeddings from many different checkpoints over time — whichever version was live when that item was last enriched.
  • When items are rebuilt at different rates, the index's version-distance distribution spreads wide.
  • Always has mixed versions within a single index.

Option B — Batch inference per rebuild (this pattern)

  • Periodically, a batch job runs the current item-tower checkpoint over the entire catalog (or a large homogeneous partition).
  • Batch output becomes the new ANN index; deploy swaps it in atomically.
  • Each deployed index is internally version-homogeneous.
  • Skew exists only between sequential index deploys, not within a single index.
  • Rebuild cost is amortized across the catalog.

Why batch is preferred for consistency

From Pinterest:

"for large tiers we favor batch embedding inference so each ANN build uses a single, consistent embedding version, and we require every new model family to go through explicit version-skew sensitivity checks as part of model readiness."

Three reasons the batch posture reduces skew-induced online-offline discrepancy:

  1. Bounded skew distribution. With batch rebuilds on a cadence, the query-tower-vs-item-tower version distance is bounded by the rebuild cadence, not by the time since an item was last touched by streaming enrichment (which could be weeks).
  2. Simpler skew reasoning. The distribution becomes "all items at version X, then all items at version X+1" rather than a long-tail mix across dozens of versions.
  3. Failure containment. If a rebuild produces bad embeddings (bad checkpoint, bad features), the entire rebuild is one atomic rollback unit. Streaming enrichment makes rollback item-by-item.

Tradeoffs

Batch embedding is not free:

  • Higher compute cost per rebuild. Embedding the entire catalog periodically is expensive vs. embedding only changed items.
  • Lower freshness. New or updated items wait until the next rebuild to appear with up-to-date embeddings; streaming enrichment provides near-realtime freshness.
  • Build + deploy latency. Large-tier rebuilds "can span days" at Pinterest scale, which is precisely why skew is a problem in the first place — but batch doesn't make rebuilds faster, just version-homogeneous.
  • Warm-up / ramp considerations. Atomic deploys can cause sudden shifts in recall / relevance; gradual rollouts may be needed.

When to use batch vs streaming

Context Preferred approach
Large tier, high-stakes ranking, skew-sensitive model family Batch for consistency
Small tier, low-stakes retrieval, skew-robust model family Streaming acceptable
New / fresh items must appear immediately Streaming or hybrid
Model rollout cadence significantly faster than index rebuild cadence Batch to prevent version sprawl
Hybrid: batch rebuild as baseline + streaming for fresh content Common production compromise

A hybrid pattern is common in practice: a batch-built base index covering most of the catalog, plus a streaming-enriched sidecar for newly-created or recently-modified items. This preserves freshness while keeping the bulk of the index version-homogeneous — and the streaming slice's skew impact is bounded to its size.

Relationship to other patterns + concepts

Seen in

Last updated · 319 distilled / 1,201 read