PATTERN Cited by 1 source
Batch embedding for index consistency¶
Intent¶
When building an ANN index for a two-tower retrieval / ranking system, favor batch inference over a single consistent model checkpoint over incremental streaming updates that mix embedding versions. Each rebuilt index is version-homogeneous; embedding version mismatches live only across sequential index builds, not within them.
Pinterest's 2026-02-27 L1 CVR retrospective (sources/2026-02-27-pinterest-bridging-the-gap-online-offline-discrepancy-l1-cvr) documents this as their deployment posture for large tiers, specifically as a mitigation for embedding version skew.
The structural choice¶
Two ways to populate an ANN index with item embeddings from a two-tower model:
Option A — Streaming / incremental enrichment (what causes skew)¶
- Each item change triggers a real-time re-embedding by the current item-tower checkpoint.
- Embeddings are written to the index as they arrive.
- The index accumulates embeddings from many different checkpoints over time — whichever version was live when that item was last enriched.
- When items are rebuilt at different rates, the index's version-distance distribution spreads wide.
- Always has mixed versions within a single index.
Option B — Batch inference per rebuild (this pattern)¶
- Periodically, a batch job runs the current item-tower checkpoint over the entire catalog (or a large homogeneous partition).
- Batch output becomes the new ANN index; deploy swaps it in atomically.
- Each deployed index is internally version-homogeneous.
- Skew exists only between sequential index deploys, not within a single index.
- Rebuild cost is amortized across the catalog.
Why batch is preferred for consistency¶
From Pinterest:
"for large tiers we favor batch embedding inference so each ANN build uses a single, consistent embedding version, and we require every new model family to go through explicit version-skew sensitivity checks as part of model readiness."
Three reasons the batch posture reduces skew-induced online-offline discrepancy:
- Bounded skew distribution. With batch rebuilds on a cadence, the query-tower-vs-item-tower version distance is bounded by the rebuild cadence, not by the time since an item was last touched by streaming enrichment (which could be weeks).
- Simpler skew reasoning. The distribution becomes "all items at version X, then all items at version X+1" rather than a long-tail mix across dozens of versions.
- Failure containment. If a rebuild produces bad embeddings (bad checkpoint, bad features), the entire rebuild is one atomic rollback unit. Streaming enrichment makes rollback item-by-item.
Tradeoffs¶
Batch embedding is not free:
- Higher compute cost per rebuild. Embedding the entire catalog periodically is expensive vs. embedding only changed items.
- Lower freshness. New or updated items wait until the next rebuild to appear with up-to-date embeddings; streaming enrichment provides near-realtime freshness.
- Build + deploy latency. Large-tier rebuilds "can span days" at Pinterest scale, which is precisely why skew is a problem in the first place — but batch doesn't make rebuilds faster, just version-homogeneous.
- Warm-up / ramp considerations. Atomic deploys can cause sudden shifts in recall / relevance; gradual rollouts may be needed.
When to use batch vs streaming¶
| Context | Preferred approach |
|---|---|
| Large tier, high-stakes ranking, skew-sensitive model family | Batch for consistency |
| Small tier, low-stakes retrieval, skew-robust model family | Streaming acceptable |
| New / fresh items must appear immediately | Streaming or hybrid |
| Model rollout cadence significantly faster than index rebuild cadence | Batch to prevent version sprawl |
| Hybrid: batch rebuild as baseline + streaming for fresh content | Common production compromise |
A hybrid pattern is common in practice: a batch-built base index covering most of the catalog, plus a streaming-enriched sidecar for newly-created or recently-modified items. This preserves freshness while keeping the bulk of the index version-homogeneous — and the streaming slice's skew impact is bounded to its size.
Relationship to other patterns + concepts¶
- patterns/version-skew-sensitivity-check — the measurement that tells you whether your model family needs the batch posture.
- concepts/embedding-version-skew — the hazard this pattern mitigates.
- concepts/two-tower-architecture — the architectural context where this pattern applies.
- concepts/ann-index — the serving artifact whose build posture is being chosen.
Seen in¶
- sources/2026-02-27-pinterest-bridging-the-gap-online-offline-discrepancy-l1-cvr — canonical wiki instance. Pinterest's stated deployment posture for large-tier ANN indices: batch embedding inference for each ANN build to ensure a single consistent embedding version, explicitly as a mitigation for embedding version skew on DHEN-class and other skew-sensitive model families.