PATTERN Cited by 1 source

Two-flavor codebook precision vs discovery¶

Pattern¶

Run two parallel codebooks that share the same RQ-VAE quantization machinery + contrastive-loss training but differ in their upstream embedding substrate, producing two distinct cluster characters:

Precision flavor — tight substitute clusters; surfaces that need "the same thing, different brand" (substitution, search, reordering).
Discovery flavor — broader thematic clusters; surfaces that need cross-category exploration (homepage feeds, cross-selling, exploration).

Route each downstream surface to the flavor that matches its needs.

Quote (Source: sources/2026-06-02-instacart-semantic-ids-product-understanding-at-scale):

"Neither is universally better. The key is matching the right flavor to the right surface."

Why one codebook isn't enough¶

Recsys surfaces have structurally different similarity needs:

Surface	Needs	Why
Substitution / cart replacement	Tight substitute clusters	A customer wants Pecorino Romano; suggesting tapenade is wrong
Search / reordering	Tight substitute clusters	Search queries map to substitute pools
Homepage feeds	Broad thematic clusters	A customer browsing should see lifestyle-coherent options
Cross-selling	Broad thematic clusters	The point is to suggest complementary products, not substitutes
Exploration	Broad thematic clusters	Suggesting more substitutes defeats the surface's purpose

A single codebook would force a compromise. Tight-substitute codebook → bad homepage feeds. Broad-thematic codebook → bad substitutions. The two-flavor pattern avoids the compromise by maintaining two parallel codebooks and per-surface flavor routing.

The structural pieces¶

1. Two upstream embedding substrates¶

The flavor distinction lives entirely in the embedding that feeds the RQ-VAE. The quantizer + contrastive loss + catalog supervision are shared.

Precision substrate (Instacart: ESCI):

Train a domain-specific embedding on query-product matching (search relevance) — Instacart's ESCI model uses Exact / Substitute / Complementary / Irrelevant labels from search data.
The embedding's training objective directly encodes substitution semantics.
Resulting clusters: tight substitute pools.

Discovery substrate (Instacart: ESCI+Gemma):

Run the product through an LLM (Instacart: Gemini Flash, ~10× faster, ~5× cheaper than full-size Gemini) to extract structured attributes (product type, key ingredients, dietary tags, format) and strip marketing copy + ESCI-style metadata.
Embed the cleaned representation with an off-the-shelf general-purpose embedding model (Instacart: Gemma).
Resulting clusters: broader thematic pools that capture lifestyle / usage patterns. (The LLM attribute-extraction preprocessing pattern is a load-bearing ingredient.)

2. Identical downstream RQ-VAE training¶

Both substrates feed the same RQ-VAE training pipeline:

Same residual-quantization architecture.
Same contrastive-loss term (L_total = L_reconstruction + L_rq + λ · L_contrastive, λ = 0.01).
Same hierarchical batch sampling.
Same coarse-level-weighted loss.

The shared downstream is what makes the pattern an axis, not a pair of separate systems: the design dial is the embedding choice upstream.

3. Per-surface flavor routing¶

Each consuming surface declares which flavor it wants. Concrete mapping (Source: same):

Surface	Flavor
Substitution	ESCI (precision)
Search	ESCI (precision)
Reordering	ESCI (precision)
Homepage feeds	ESCI+Gemma (discovery)
Cross-selling	ESCI+Gemma (discovery)
Exploration	ESCI+Gemma (discovery)

The routing is a config decision; both codebooks are kept current in parallel.

Validation: LLM-cluster-eval discriminates the flavors¶

The pattern's correctness depends on the two flavors actually producing different cluster character. Instacart validates this via LLM-based cluster evaluation on three dimensions:

Dimension	ESCI (precision)	ESCI+Gemma (discovery)
Functional coherence	Higher	Lower
Customer journey relevance	Lower	Higher

Quote: "ESCI scores higher on substitutability; ESCI+Gemma excels at thematic coherence, matching their intended use cases."

This is the load-bearing measurement that the pattern works: the two flavors aren't just different in implementation, they produce measurably different cluster character that maps to the intended use cases.

Generalization beyond Instacart¶

The pattern generalizes wherever recsys surfaces have divergent similarity needs:

Domain	Precision flavor	Discovery flavor
Music	"More like this artist" / "alternative artists"	Mood / occasion / playlist context
Video	"Similar films" / next episode	Themed collections / cross-genre
Books	Same-author / same-series	Topical recommendations
Knowledge bases	Exact-match retrieval	Related-question retrieval
Code search	Same-API / same-method	Solution-pattern discovery

The substrate-agnostic insight: embedding-substrate choice is the load-bearing dial for cluster character, and surfaces have different cluster-character needs.

When the pattern doesn't fit¶

Single-surface recsys systems — if you only have one downstream consumer, the dual-codebook overhead isn't justified.
Tightly constrained training compute — running two RQ-VAE pipelines doubles the codebook-maintenance cost.
Tightly constrained inference compute — both codebooks must be kept loaded at serving time if both flavors are ever needed in the same request path.
Surfaces with hybrid needs — a single surface needing both precision and discovery requires a more complex routing strategy (mix outputs from both codebooks, or use a meta-model to pick per-request).

Caveats¶

Two codebooks doubles the codebook-maintenance work — training cadence, eval cadence, version-stability discipline must be duplicated.
Per-surface flavor routing is a config decision with no disclosed tooling — Instacart's surfaces' flavor declarations, default behavior, migration semantics aren't specified.
No "third flavor" framework — Instacart stops at two. Whether more flavors (occasion-aware, dietary-constrained, brand-tier specific) compose or require a different design isn't addressed.
Production routing strategy not specified — does Instacart run two RQ-VAEs in parallel pipelines, or one pipeline producing both codebooks side by side?
Flavor-specific cold-start coverage — both flavors inherit codebook coverage for new items, but sparse-text products may hit different failure rates per flavor (the post documents divergent codes generally, not flavor-stratified).

Seen in¶

sources/2026-06-02-instacart-semantic-ids-product-understanding-at-scale — canonical wiki instance: Instacart's ESCI (precision) + ESCI+Gemma (discovery) two-flavor codebook design. Same RQ-VAE + contrastive loss; different upstream embedding. Validated by LLM-cluster-evaluation showing flavor character matches intended use cases. Per-surface routing.

concepts/precision-vs-discovery-codebook-flavor — the design axis this pattern instantiates.
concepts/semantic-id — the substrate the flavors produce.
concepts/llm-based-cluster-evaluation — the validation metric.
concepts/contrastive-regularization-with-catalog-structure — the shared downstream training mechanism.
systems/instacart-semantic-ids — production instance.
systems/instacart-esci-model — precision-flavor upstream.
systems/gemma / systems/gemini — discovery-flavor upstreams.
patterns/llm-attribute-extraction-before-embedding — preprocessing the discovery flavor depends on.
patterns/rq-vae-codebook-as-product-vocabulary — broader vocabulary-substrate pattern.