SYSTEM Cited by 1 source
Instacart ESCI Model¶
Definition¶
The Instacart ESCI model is Instacart's in-house embedding model trained on search relevance data, where the training labels follow the four-class ESCI taxonomy:
- Exact (the product exactly matches the query)
- Substitute (the product is a substitute for the query)
- Complementary (the product is complementary to the query)
- Irrelevant (the product is unrelated to the query)
The model learns a representation tuned for "is this the same thing the customer asked for?" — query-product semantic matching. Its canonical use on the wiki is as the precision-flavor upstream embedding for Instacart's Semantic IDs.
What it produces¶
ESCI consumes a product's raw text features — name, brand, description, size, attributes, category path — and outputs a high-dimensional embedding. Quote (Source: sources/2026-06-02-instacart-semantic-ids-product-understanding-at-scale):
"ESCI (precision) embeds raw product text (name, brand, description, size, some attributes and categories) through our search relevance model, which was trained on query-product matching. The result: embeddings tuned for 'is this the same thing the customer asked for?'"
These embeddings have a substitution-axis cluster character: products with similar embeddings are likely substitutes.
Where ESCI sits in the SID pipeline¶
ESCI is the embedding stage for the precision flavor of Instacart's two-flavor codebook design. The pipeline:
product text features
│
▼
ESCI search-relevance model ← this system
(query-product matching)
│
▼
high-dimensional embedding
│
▼
RQ-VAE quantizer + contrastive loss
│
▼
ESCI Semantic IDs (precision)
│
▼
substitution / search / reordering surfaces
Cluster character it produces¶
The post discloses one concrete example of ESCI-flavor SID clusters:
Whole Bean Coffee at SID 0_8_55_72:
"This produces tight clusters where every item is a direct substitute, like Whole Bean Coffee (
0_8_55_72), where each product is a medium roast from a different brand, interchangeable for any customer who wants whole bean coffee."
The cluster contains direct substitutes — different brands of the same product format. This is what makes ESCI suitable for substitution, search, and reordering: surfaces where the customer wants the same thing.
Trade-off vs general-purpose embeddings¶
ESCI is domain-specific: its training data (Instacart search queries + product matches) and training objective (ESCI relevance classification) are tuned to grocery substitution. It outperforms general-purpose embeddings on substitution tasks but produces narrower, less thematic clusters than off-the-shelf models like Gemma (which Instacart uses for the discovery flavor with LLM attribute extraction preprocessing).
This is the load-bearing argument for the two-flavor codebook design: domain-specific embedding for precision, cleaned general-purpose embedding for discovery. Each codebook is best for its intended use case; neither is universally better.
Caveats¶
- ESCI is referenced but not deeply described in the 2026-06-02 post. A separate Instacart blog post is referenced — How Instacart Uses Embeddings to Improve Search Relevance — for the full architecture; this wiki page is a stub at that level.
- Architecture details not disclosed: model size, training data volume, loss formulation beyond the four-class ESCI relevance taxonomy.
- Multi-language support, query coverage, refresh cadence all undisclosed.
- ESCI labels are search-derived — they encode user-revealed similarity at search time, not direct ground truth on substitution preferences. Two products labeled "Substitute" for the same query are similar; two products that always serve different queries may be equally similar but not labeled so.
Seen in¶
- sources/2026-06-02-instacart-semantic-ids-product-understanding-at-scale
— first canonical wiki disclosure: ESCI as the embedding upstream
for the precision-flavor Instacart Semantic IDs codebook. Trained
on query-product matching with the Exact / Substitute /
Complementary / Irrelevant labels, producing tight substitute
clusters (e.g. Whole Bean Coffee SID
0_8_55_72).
Related¶
- companies/instacart — owning organization.
- systems/instacart-semantic-ids — the consumer of ESCI embeddings.
- systems/rq-vae — the algorithm that compresses ESCI embeddings into precision-flavor codes.
- concepts/precision-vs-discovery-codebook-flavor — the design axis ESCI sits on the precision side of.
- concepts/semantic-id — the substrate ESCI feeds into.
- patterns/two-flavor-codebook-precision-vs-discovery — the design pattern.
- patterns/rq-vae-codebook-as-product-vocabulary — broader vocabulary-substrate pattern.