SYSTEM Cited by 2 sources

RQ-VAE¶

Definition¶

RQ-VAE — Residual Quantized Variational Autoencoder — is a generative-model architecture that learns a hierarchical codebook for discretising continuous embeddings into short sequences of codeword indices. Each input is encoded as K codeword indices drawn from K learned codebooks, where each successive codebook captures the residual (the part not yet captured by previous codebooks).

input embedding e ∈ R^d
    │
    ▼
codebook_1: argmin_c ||e - c||²       → token_1
    │
    ▼
residual_1 = e - codebook_1[token_1]
    │
    ▼
codebook_2: argmin_c ||residual_1 - c||² → token_2
    │
    ▼
residual_2 = residual_1 - codebook_2[token_2]
    │
    ▼
... continue for K levels ...
    │
    ▼
output: (token_1, token_2, ..., token_K) — the "Semantic ID"

Where it differs from plain Vector Quantization (VQ-VAE): plain VQ-VAE uses a single codebook and a single token; RQ-VAE stacks K codebooks where each captures the residual of the previous. The resulting K-token sequence is shorter than would be needed by a single flat codebook of equivalent representational capacity, and — critically for generative recsys — products with similar embeddings share early tokens, giving the prefix-sharing semantic similarity property that Semantic IDs depend on.

Why it shows up on the wiki¶

Disclosed as the algorithm behind Instacart Semantic IDs (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):

"Instacart Semantic IDs, SIDs, replace atomic product IDs with short sequences of codewords generated by an RQ-VAE. A product's SID looks like 35_7_120_184: four tokens from learned codebooks at different granularity levels."

RQ-VAE is the load-bearing piece of the TIGER paper that makes generative retrieval over the catalog vocabulary economical.

Hierarchical-codebook property — the key recsys win¶

The post discloses that the four codebooks are "at different granularity levels". The recsys consequence: when the generative-retrieval decoder produces the first codeword, it is choosing a coarse semantic neighbourhood (e.g. bakery); the second narrows it (e.g. bread); the third narrows further (e.g. Italian bread); the fourth distinguishes individual SKUs. Beam search at each step is therefore choosing among semantically meaningful options rather than across the full catalog vocabulary.

The 2026-06-02 source's three-product prefix example demonstrates this in production:

SID	Product
`35_7_119_493`	Organic Good Seed Thin Sliced
`35_7_120_184`	Artisanal Italian Bread
`35_7_120_185`	Classic Italian Bread

Shared 35_7_… prefix = bread / bakery semantic neighbourhood. Shared 35_7_120_… prefix = Italian-bread sub-category.

Why this is a non-trivial alternative to embeddings¶

Plain item embeddings (the conventional recsys vocabulary substrate) are continuous and require Approximate Nearest Neighbour search (concepts/ann-index) to retrieve. Item-as-discrete-token substrates (the GRU4Rec / SASRec lineage) require a vocabulary the size of the catalog, hitting the vocabulary bottleneck.

RQ-VAE-derived Semantic IDs thread the needle: discrete (so they can be the output of an autoregressive decoder) but with shared substructure (so the embedding parameter space scales with the codebook size, not the catalog size — Instacart reports a 125× reduction in embedding parameter space).

Caveats¶

This is a stub page capturing RQ-VAE as the algorithmic substrate behind SIDs. Original-paper-level architectural detail (encoder/decoder shape, training objective, codebook update scheme, dead-codeword handling) is not reproduced here; future ingest of the Instacart companion Semantic IDs: Product Understanding at Scale post would deepen this.
The TIGER paper (Rajput et al., NeurIPS 2023) is the load-bearing reference for the recsys application of RQ-VAE.

Instacart's training methodology (2026-06-02 deep-companion)¶

The 2026-06-02 Semantic IDs: Product Understanding at Scale post discloses Instacart's RQ-VAE training methodology with two key extensions over vanilla RQ-VAE:

Catalog-tree contrastive regularization¶

Vanilla RQ-VAE optimizes only reconstruction fidelity. Without structural guidance, two failure modes appear (Source: sources/2026-06-02-instacart-semantic-ids-product-understanding-at-scale):

Fragmentation — "two marinara sauces that any customer would consider substitutes end up in different branches".
Error propagation — "a product with product details, category and descriptions gets embedded poorly and placed among irrelevant items."

The fix: add a contrastive loss term using the catalog taxonomy as graded supervision (see concepts/contrastive-regularization-with-catalog-structure + patterns/contrastive-loss-via-taxonomy-tree). Loss formula:

L_total = L_reconstruction + L_rq + λ · L_contrastive

with λ = 0.01 — "a gentle regularizer: strong enough to improve coherence, weak enough not to destabilize reconstruction" — and coarser codebook levels (L1, L2) weighted more heavily within the contrastive term so broad groupings take priority. (See concepts/reconstruction-vs-semantic-loss-tradeoff.)

Hierarchical batch sampling¶

The contrastive loss requires each batch to contain same-leaf, sibling-leaf, and unrelated pairs. Random sampling over millions of items would produce no positive signal. The fix: deliberate batch construction — pick a parent category → fill ~half batch with its children → fill rest with unrelated categories → multi-sample within each category slot. (See concepts/hierarchical-batch-sampling-for-contrastive-loss.)

Two flavors via different upstream embeddings¶

Instacart trains the same RQ-VAE + contrastive loss + catalog supervision against two different upstream embedding substrates (see patterns/two-flavor-codebook-precision-vs-discovery):

ESCI (precision) — domain-specific search-relevance embedding → tight substitute clusters.
ESCI+Gemma (discovery) — Gemini-Flash-cleaned attributes → off-the-shelf Gemma embedding → broader thematic clusters.

Architectural insight: "The embedding is the decision. The RQ-VAE compresses whatever structure the embedding space gives it. Choose your embedding based on the business problem."

Disclosed cardinality¶

~2,000 codeword tokens represent Instacart's entire catalog across 4 hierarchical codebooks. This is the concrete vocabulary size that escapes the catalog-bounded vocabulary of atomic product IDs — small enough to make autoregressive generative decoding economical.

Cluster character (production examples)¶

Under SID prefix 6_19_:

6_19_32 — Italian cheeses (Parmigiano, Pecorino, Mozzarella, Ricotta).
6_19_24 — Specialty cheeses (Brie, Manchego, Halloumi, Goat cheese).
6_19_12 — Olives (Castelvetrano, Kalamata, olive medleys).
6_19_7 — Tapenades (olive tapenade, spreads).
6_19_9 — Deli trays and dips.
6_19_14 — Croutons.

Quote: "No one wrote a rule connecting Pecorino Romano to Kalamata olives to olive tapenade. The model learned that these products inhabit the same culinary universe… by compressing their embeddings into codes that share a prefix."

Seen in¶

sources/2026-06-02-instacart-semantic-ids-product-understanding-at-scale — deep-companion training-methodology disclosure: RQ-VAE trained with L_total = L_reconstruction + L_rq + λ · L_contrastive at λ = 0.01; catalog-tree-graded contrastive supervision (same-leaf strong+ / sibling-leaf moderate+ / no-shared-ancestor −); hierarchical batch sampling; two-flavor application (ESCI precision + ESCI+Gemma discovery); ~2,000 codeword tokens for Instacart's entire catalog; production cluster examples (6_19_* Italian-cheese-and-accompaniments prefix family); intrinsic-evaluation methodology.
sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart — algorithmic substrate behind Instacart Semantic IDs (the consumer side).

systems/instacart-semantic-ids — first wiki-canonicalised production application.
systems/tiger-generative-retrieval — the reference paper that introduced RQ-VAE-based semantic IDs for generative recsys.
concepts/semantic-id — canonical concept page.
concepts/atomic-product-id-vs-semantic-id — the substrate trade-off RQ-VAE enables.
patterns/rq-vae-codebook-as-product-vocabulary — canonical pattern.