SYSTEM Cited by 1 source

Instacart ESCI Model¶

Definition¶

The Instacart ESCI model is Instacart's in-house embedding model trained on search relevance data, where the training labels follow the four-class ESCI taxonomy:

Exact (the product exactly matches the query)
Substitute (the product is a substitute for the query)
Complementary (the product is complementary to the query)
Irrelevant (the product is unrelated to the query)

The model learns a representation tuned for "is this the same thing the customer asked for?" — query-product semantic matching. Its canonical use on the wiki is as the precision-flavor upstream embedding for Instacart's Semantic IDs.

What it produces¶

ESCI consumes a product's raw text features — name, brand, description, size, attributes, category path — and outputs a high-dimensional embedding. Quote (Source: sources/2026-06-02-instacart-semantic-ids-product-understanding-at-scale):

"ESCI (precision) embeds raw product text (name, brand, description, size, some attributes and categories) through our search relevance model, which was trained on query-product matching. The result: embeddings tuned for 'is this the same thing the customer asked for?'"

These embeddings have a substitution-axis cluster character: products with similar embeddings are likely substitutes.

Where ESCI sits in the SID pipeline¶

ESCI is the embedding stage for the precision flavor of Instacart's two-flavor codebook design. The pipeline:

product text features
    │
    ▼
ESCI search-relevance model     ← this system
(query-product matching)
    │
    ▼
high-dimensional embedding
    │
    ▼
RQ-VAE quantizer + contrastive loss
    │
    ▼
ESCI Semantic IDs (precision)
    │
    ▼
substitution / search / reordering surfaces

Cluster character it produces¶

The post discloses one concrete example of ESCI-flavor SID clusters: Whole Bean Coffee at SID 0_8_55_72:

"This produces tight clusters where every item is a direct substitute, like Whole Bean Coffee (0_8_55_72), where each product is a medium roast from a different brand, interchangeable for any customer who wants whole bean coffee."

The cluster contains direct substitutes — different brands of the same product format. This is what makes ESCI suitable for substitution, search, and reordering: surfaces where the customer wants the same thing.

Trade-off vs general-purpose embeddings¶

ESCI is domain-specific: its training data (Instacart search queries + product matches) and training objective (ESCI relevance classification) are tuned to grocery substitution. It outperforms general-purpose embeddings on substitution tasks but produces narrower, less thematic clusters than off-the-shelf models like Gemma (which Instacart uses for the discovery flavor with LLM attribute extraction preprocessing).

This is the load-bearing argument for the two-flavor codebook design: domain-specific embedding for precision, cleaned general-purpose embedding for discovery. Each codebook is best for its intended use case; neither is universally better.

Caveats¶

ESCI is referenced but not deeply described in the 2026-06-02 post. A separate Instacart blog post is referenced — How Instacart Uses Embeddings to Improve Search Relevance — for the full architecture; this wiki page is a stub at that level.
Architecture details not disclosed: model size, training data volume, loss formulation beyond the four-class ESCI relevance taxonomy.
Multi-language support, query coverage, refresh cadence all undisclosed.
ESCI labels are search-derived — they encode user-revealed similarity at search time, not direct ground truth on substitution preferences. Two products labeled "Substitute" for the same query are similar; two products that always serve different queries may be equally similar but not labeled so.

Seen in¶

sources/2026-06-02-instacart-semantic-ids-product-understanding-at-scale — first canonical wiki disclosure: ESCI as the embedding upstream for the precision-flavor Instacart Semantic IDs codebook. Trained on query-product matching with the Exact / Substitute / Complementary / Irrelevant labels, producing tight substitute clusters (e.g. Whole Bean Coffee SID 0_8_55_72).

companies/instacart — owning organization.
systems/instacart-semantic-ids — the consumer of ESCI embeddings.
systems/rq-vae — the algorithm that compresses ESCI embeddings into precision-flavor codes.
concepts/precision-vs-discovery-codebook-flavor — the design axis ESCI sits on the precision side of.
concepts/semantic-id — the substrate ESCI feeds into.
patterns/two-flavor-codebook-precision-vs-discovery — the design pattern.
patterns/rq-vae-codebook-as-product-vocabulary — broader vocabulary-substrate pattern.