Skip to content

CONCEPT Cited by 1 source

Precision-vs-discovery codebook flavor

Definition

Precision-vs-discovery codebook flavor is a design axis for Semantic ID codebooks: the same RQ-VAE quantizer + contrastive-loss machinery can be trained against different upstream embeddings to produce codebooks with different cluster character — precision (tight substitute clusters) or discovery (broader thematic clusters).

The choice is per surface, not universal. A single recsys platform can run two parallel codebooks and route different consumers to different flavors based on which serving surface needs which character.

Quote (Source: sources/2026-06-02-instacart-semantic-ids-product-understanding-at-scale):

"Neither is universally better. The key is matching the right flavor to the right surface."

The two flavors at Instacart

Instacart ships two codebook flavors:

Flavor Upstream embedding Cluster character Use cases
ESCI (precision) Raw product text → in-house ESCI search-relevance model (trained on query-product matching, Exact / Substitute / Complementary / Irrelevant) Tight substitute clusters; e.g. Whole Bean Coffee (0_8_55_72) where every item is a medium roast from a different brand, interchangeable for any customer who wants whole bean coffee Substitution, search, reordering
ESCI+Gemma (discovery) Gemini Flash extracts structured attributes (product type, key ingredients, dietary tags, format) and strips marketing copy → Gemma (off-the-shelf) embeds the cleaned representation Broader clusters that capture lifestyle and usage patterns Homepage feeds, cross-selling, exploration

The architectural insight:

"The embedding is the decision. The RQ-VAE compresses whatever structure the embedding space gives it. Choose your embedding based on the business problem."

The downstream RQ-VAE + contrastive regularization machinery is identical between the two flavors. The flavor distinction lives entirely upstream — at the embedding-substrate choice.

Why one substrate isn't enough

Recsys surfaces have structurally different needs:

  • Substitution (cart replacement, out-of-stock fallback, reordering) needs tight similarity: a customer wants Pecorino Romano; the substitute pool should be other hard Italian cheeses, not Italian-style spreads. The match has to feel like "the same thing, different brand".
  • Discovery (homepage feeds, cross-sell, exploration) needs broader, more associative clusters: a customer who bought Parmigiano shouldn't only see other parmesan cheeses; they should see olives, tapenade, deli trays — products that "inhabit the same culinary universe" even if they're functionally different.

A single codebook would force a compromise: tight enough for substitution would lose discovery breadth; broad enough for discovery would loosen substitution. The two-flavor design avoids the compromise by maintaining two parallel codebooks and per-surface flavor routing.

How the two flavors get measured

The Instacart post evaluates both flavors via LLM-based cluster evaluation on three dimensions:

Dimension ESCI ESCI+Gemma
Functional coherence (substitute-axis) Higher Lower
Customer journey relevance (thematic) Lower Higher

Quote: "ESCI scores higher on substitutability; ESCI+Gemma excels at thematic coherence, matching their intended use cases."

This is the load-bearing validation that the design axis works as intended — the two flavors aren't just different in implementation, they produce measurably different cluster character that maps to the intended use cases.

The LLM-attribute-extraction step (discovery flavor)

The discovery flavor's distinctive upstream step is the LLM attribute-extraction preprocessing:

  1. Run the product through Gemini Flash (~10× faster, ~5× cheaper than full-size Gemini).
  2. Extract structured attributes — product type, key ingredients, dietary tags, format.
  3. Strip marketing copy and ESCI-style metadata.
  4. Embed the cleaned representation with Gemma (off-the-shelf).

The hypothesis the post tests:

"The goal is to test whether a general-purpose model, given cleaner inputs, can capture nuances that a domain-specific model misses."

The LLM-attribute step is what makes the off-the-shelf embedding model competitive: rather than making Gemma understand the noisy raw catalog text, the LLM does that work upstream and hands Gemma clean inputs. The pre-processing cost is bounded by Gemini Flash's disclosed efficiency (~10× faster, ~5× cheaper than full-size models).

Generalization

The two-flavor design generalizes to any recsys / retrieval substrate with surfaces that need different similarity character:

  • Music — substitute (similar artist) vs discovery (mood / occasion / playlist context).
  • Video — substitute (next episode / similar film) vs discovery (themed collection / cross-genre).
  • Text (Q&A retrieval) — exact-answer vs related-question.

What stays constant: the codebook training algorithm, the contrastive loss, the catalog-structure supervision. What changes: the upstream embedding's training objective.

Caveats

  • Two codebooks doubles the codebook-maintenance cost. Training cadence, eval cadence, version-stability discipline must be duplicated.
  • Per-surface flavor routing is a config decision with no disclosed tooling — how Instacart's surfaces declare flavor preference, default behavior, and migration between flavors is not stated.
  • Hybrid use-cases (precision + discovery) need explicit reconciliation. A homepage feed that occasionally wants substitution-quality recommendations would need both codebooks loaded; the post doesn't address mixed routing.
  • Beyond two flavors? The post stops at two. Whether more flavors (e.g. occasion-aware, dietary-constrained, brand-tier specific) would compose or require a different design isn't addressed.
  • Flavor-specific cold-start coverage equivalent? Both flavors inherit codebook coverage for new items, but do they handle sparse-text products equivalently? The Riesling and t-shirt failure cases were noted in general, not flavor-stratified.
  • Production routing strategy not specified. Does Instacart run two RQ-VAEs in parallel and switch per surface, or run one pipeline that produces both codebooks side by side?

Seen in

Last updated · 542 distilled / 1,571 read