CONCEPT Cited by 1 source
Precision-vs-discovery codebook flavor¶
Definition¶
Precision-vs-discovery codebook flavor is a design axis for Semantic ID codebooks: the same RQ-VAE quantizer + contrastive-loss machinery can be trained against different upstream embeddings to produce codebooks with different cluster character — precision (tight substitute clusters) or discovery (broader thematic clusters).
The choice is per surface, not universal. A single recsys platform can run two parallel codebooks and route different consumers to different flavors based on which serving surface needs which character.
Quote (Source: sources/2026-06-02-instacart-semantic-ids-product-understanding-at-scale):
"Neither is universally better. The key is matching the right flavor to the right surface."
The two flavors at Instacart¶
Instacart ships two codebook flavors:
| Flavor | Upstream embedding | Cluster character | Use cases |
|---|---|---|---|
| ESCI (precision) | Raw product text → in-house ESCI search-relevance model (trained on query-product matching, Exact / Substitute / Complementary / Irrelevant) | Tight substitute clusters; e.g. Whole Bean Coffee (0_8_55_72) where every item is a medium roast from a different brand, interchangeable for any customer who wants whole bean coffee |
Substitution, search, reordering |
| ESCI+Gemma (discovery) | Gemini Flash extracts structured attributes (product type, key ingredients, dietary tags, format) and strips marketing copy → Gemma (off-the-shelf) embeds the cleaned representation | Broader clusters that capture lifestyle and usage patterns | Homepage feeds, cross-selling, exploration |
The architectural insight:
"The embedding is the decision. The RQ-VAE compresses whatever structure the embedding space gives it. Choose your embedding based on the business problem."
The downstream RQ-VAE + contrastive regularization machinery is identical between the two flavors. The flavor distinction lives entirely upstream — at the embedding-substrate choice.
Why one substrate isn't enough¶
Recsys surfaces have structurally different needs:
- Substitution (cart replacement, out-of-stock fallback, reordering) needs tight similarity: a customer wants Pecorino Romano; the substitute pool should be other hard Italian cheeses, not Italian-style spreads. The match has to feel like "the same thing, different brand".
- Discovery (homepage feeds, cross-sell, exploration) needs broader, more associative clusters: a customer who bought Parmigiano shouldn't only see other parmesan cheeses; they should see olives, tapenade, deli trays — products that "inhabit the same culinary universe" even if they're functionally different.
A single codebook would force a compromise: tight enough for substitution would lose discovery breadth; broad enough for discovery would loosen substitution. The two-flavor design avoids the compromise by maintaining two parallel codebooks and per-surface flavor routing.
How the two flavors get measured¶
The Instacart post evaluates both flavors via LLM-based cluster evaluation on three dimensions:
| Dimension | ESCI | ESCI+Gemma |
|---|---|---|
| Functional coherence (substitute-axis) | Higher | Lower |
| Customer journey relevance (thematic) | Lower | Higher |
Quote: "ESCI scores higher on substitutability; ESCI+Gemma excels at thematic coherence, matching their intended use cases."
This is the load-bearing validation that the design axis works as intended — the two flavors aren't just different in implementation, they produce measurably different cluster character that maps to the intended use cases.
The LLM-attribute-extraction step (discovery flavor)¶
The discovery flavor's distinctive upstream step is the LLM attribute-extraction preprocessing:
- Run the product through Gemini Flash (~10× faster, ~5× cheaper than full-size Gemini).
- Extract structured attributes — product type, key ingredients, dietary tags, format.
- Strip marketing copy and ESCI-style metadata.
- Embed the cleaned representation with Gemma (off-the-shelf).
The hypothesis the post tests:
"The goal is to test whether a general-purpose model, given cleaner inputs, can capture nuances that a domain-specific model misses."
The LLM-attribute step is what makes the off-the-shelf embedding model competitive: rather than making Gemma understand the noisy raw catalog text, the LLM does that work upstream and hands Gemma clean inputs. The pre-processing cost is bounded by Gemini Flash's disclosed efficiency (~10× faster, ~5× cheaper than full-size models).
Generalization¶
The two-flavor design generalizes to any recsys / retrieval substrate with surfaces that need different similarity character:
- Music — substitute (similar artist) vs discovery (mood / occasion / playlist context).
- Video — substitute (next episode / similar film) vs discovery (themed collection / cross-genre).
- Text (Q&A retrieval) — exact-answer vs related-question.
What stays constant: the codebook training algorithm, the contrastive loss, the catalog-structure supervision. What changes: the upstream embedding's training objective.
Caveats¶
- Two codebooks doubles the codebook-maintenance cost. Training cadence, eval cadence, version-stability discipline must be duplicated.
- Per-surface flavor routing is a config decision with no disclosed tooling — how Instacart's surfaces declare flavor preference, default behavior, and migration between flavors is not stated.
- Hybrid use-cases (precision + discovery) need explicit reconciliation. A homepage feed that occasionally wants substitution-quality recommendations would need both codebooks loaded; the post doesn't address mixed routing.
- Beyond two flavors? The post stops at two. Whether more flavors (e.g. occasion-aware, dietary-constrained, brand-tier specific) would compose or require a different design isn't addressed.
- Flavor-specific cold-start coverage equivalent? Both flavors inherit codebook coverage for new items, but do they handle sparse-text products equivalently? The Riesling and t-shirt failure cases were noted in general, not flavor-stratified.
- Production routing strategy not specified. Does Instacart run two RQ-VAEs in parallel and switch per surface, or run one pipeline that produces both codebooks side by side?
Seen in¶
- sources/2026-06-02-instacart-semantic-ids-product-understanding-at-scale — first canonical wiki disclosure: ESCI (precision) and ESCI+Gemma (discovery) as Instacart's two-flavor codebook design. Same RQ-VAE + contrastive loss; different upstream embedding. Validated by LLM-cluster-evaluation showing flavor character matches intended use cases.
Related¶
- concepts/semantic-id — the substrate the design axis applies to.
- concepts/llm-based-cluster-evaluation — the metric that validates the flavor distinction.
- concepts/contrastive-regularization-with-catalog-structure — the training-time mechanism shared across flavors.
- systems/instacart-semantic-ids — production instance running both flavors.
- systems/instacart-esci-model — the precision-flavor upstream embedding.
- systems/gemma — the discovery-flavor embedding model.
- systems/gemini — Gemini Flash for attribute extraction.
- patterns/two-flavor-codebook-precision-vs-discovery — canonical pattern.
- patterns/llm-attribute-extraction-before-embedding — the preprocessing the discovery flavor depends on.
- patterns/rq-vae-codebook-as-product-vocabulary — the broader pattern this design axis fits within.