SYSTEM Cited by 1 source

Instacart Intent Engine¶

The Intent Engine is Instacart's LLM-backed Query Understanding (QU) system, replacing a bespoke multi-model ML stack with a three-lever hierarchy — prompting → context-engineering (RAG) → fine-tuning — applied per QU sub-task. (Source: sources/2025-11-13-instacart-building-the-intent-engine)

Positioning¶

QU sits upstream of search retrieval + ranking. Its job: turn messy user queries ("bread no gluten", "x large zip lock", "2% reduced-fat ultra-pasteurized chocolate milk") into structured intent signals (categories, rewrites, tags) that downstream retrieval + ranking can use. Legacy QU at Instacart was "notoriously difficult" for long-tail queries and suffered from:

Broad queries ("healthy food", "frozen snacks") spanning dozens of categories.
No direct feedback. QU is upstream of clicks/conversions; pseudo-labels derived from user behaviour are noisy.
Tail queries with data sparsity and no click history.
System complexity. Separate FastText classifier + session-mined rewrites + SRL tagger, each with its own data pipeline, training, and serving infra.

The Intent Engine replaces this patchwork with a single LLM substrate applied differently per sub-task, and consolidates the "feature engineering" posture of the legacy stack into a "productionize the backbone" posture. (Source: sources/2025-11-13-instacart-building-the-intent-engine)

The three-lever hierarchy¶

Instacart's stated ordering, least to most invasive:

Prompting — fast to iterate, the model only sees what's in the prompt.
Context-engineering (RAG) — retrieve Instacart-specific signals (conversion history, catalog, brand-similarity embeddings) and inject them into the prompt.
Fine-tuning — embed domain expertise into the weights (LoRA adapters + adapter merge).

The three-lever ordering is also a cost-control ordering: "To manage costs and prove value, we began with an offline LLM pipeline on high-frequency 'head' queries. This cost-effective approach handled the bulk of traffic and generated the data needed to later train a 'student' model for the long tail." Start offline, graduate to real-time strategically. (Source: sources/2025-11-13-instacart-building-the-intent-engine)

Three QU sub-tasks rebuilt¶

1. Query category classification¶

Legacy: flat multi-class FastText model on noisy conversion labels. Pitfalls: emits taxonomically inconsistent pairs ("Dairy", "Milk" as peers not parent/child), can't reason about novel compositions ("vegan roast").

New, three-step: (a) retrieve top-K historically converted categories as candidates → (b) LLM re-ranks with injected Instacart context → (c) semantic-similarity guardrail discards (query, category) pairs below a relevance threshold. The LLM is a re-ranker over a pre-filtered candidate set, not an open-universe classifier — keeps recall bounded and precision high.

2. Query rewrites¶

Legacy: session-behavior mining. Coverage only ~50% of search traffic; often emitted synonyms that weren't useful for recall expansion (for "1% milk" → "one percent milk").

New: three specialised prompts per rewrite type — Substitutes, Broader queries, Synonyms — each with chain-of-thought + few-shot exemplars + a post-processing semantic-relevance guardrail. Outcome: >95% coverage at 90%+ precision across all three types. Building on this, Instacart adds session-level context engineering (top-converting product categories from the user's subsequent in-session searches) to make rewrites personalized. (Source: sources/2025-11-13-instacart-building-the-intent-engine)

3. Semantic Role Labeling (SRL) — the hybrid system¶

SRL extracts structured concepts from a query (product, brand, attributes) used downstream for retrieval, ranking, ad targeting, and filters. Traffic is power-law — head queries can be precomputed; the tail cannot because it's "effectively infinite." Architectural decomposition:

  ┌─────────────────────────┐       ┌─────────────────────┐
  │ Offline RAG "teacher"   │──────▶│  Head-query cache   │──▶ live traffic (98%)
  │ pipeline                │       └─────────────────────┘
  │ • conversion data       │
  │ • catalog               │       ┌─────────────────────┐
  │ • brand embedding       │──────▶│ Training dataset    │──▶ trains 8B student
  │ • frontier LLM          │       └─────────────────────┘           │
  │ • post-proc guardrail   │                                          ▼
  └─────────────────────────┘                        ┌─────────────────────┐
                                                     │ Real-time Llama-3-8B│──▶ cache-miss traffic (2%)
                                                     │ + LoRA + adapter    │     "~300 ms on H100"
                                                     │ merge, on H100      │
                                                     └─────────────────────┘

Key properties:

The offline teacher pipeline is dual-purposed: its output populates both the live cache AND the student's supervised training set. Without dual purpose, you pay twice or ship a student trained on lower-quality labels. See patterns/offline-teacher-online-student-distillation.
The student is a LoRA fine-tune of Llama-3-8B. Reported "precision 96.4% vs 95.4% baseline, recall 95.0% vs 96.2%, F1 95.7% vs 95.8%" — parity F1, precision-biased. Deployment posture: precision over recall (a precise tag is more useful than a noisy one for downstream retrieval).
Latency path: out-of-box ~700 ms on A100 → 300 ms target after LoRA adapter merge + H100 upgrade. FP8 quantization gave another 10% but was not shipped due to a "slight drop in recall." GPU autoscaling at off-peak manages cost.
Cache-miss fraction: ~2% of queries hit the real-time student; ~98% served from cache. The 2% is the load-bearing number — the real-time 8B only pays its serving cost for 2% of traffic.

(Source: sources/2025-11-13-instacart-building-the-intent-engine)

Outcomes¶

6% reduction in average scroll depth on tail queries (users find items faster).
50% reduction in user complaints about poor tail-query search results.
>95% query-rewrite coverage at 90%+ precision (vs. 50% legacy coverage).
300 ms target hit for real-time SRL on H100.
Millions of cold-start queries served weekly through the real-time SRL model.

Relationship to sibling Instacart platforms¶

PIXEL (image generation, 2025-07-17) — Intent Engine shares the prompt-template library + model-agnostic architectural posture (one prompt per use case, beats one general prompt).
PARSE (attribute extraction, 2025-08-01) — Intent Engine shares the offline+online hybrid + HITL-style post-processing guardrail stance applied to different data surfaces.
Maple (batch LLM processing, 2025-08-27) — Intent Engine's offline teacher pipeline is the kind of batch workload Maple is optimised for (CSV/Parquet in, CSV/Parquet out); PARSE + PIXEL + Intent Engine are three ML-platform-consolidation plays on different data/modality axes.

Caveats¶

No QPS / scale figures for the cache or the real-time 8B. "Millions of cold-start queries weekly" is the only scale disclosure.
Frontier teacher LLM is unnamed. The post compares the 8B student against "a much larger frontier model it learned from" without specifying which frontier model — matters for reproducibility.
LoRA hyperparameters unspecified (rank, target modules, dataset size, epochs).
Distillation is response-distillation — supervised fine-tuning on teacher labels — not soft-label / logit-matching in the strict academic Hinton sense. Terminology match is colloquial.
Context-aware QU is future work — not yet shipped. Post pitches distinguishing item-search / content-discovery / restaurant-search intents based on session context.

systems/instacart-pixel / systems/instacart-parse / systems/maple-instacart — sibling Instacart ML-platform consolidations
systems/llama-3-1 — base model family for the fine-tuned 8B student
systems/nvidia-h100 / systems/nvidia-a100 — serving hardware
concepts/query-understanding / concepts/semantic-role-labeling / concepts/long-tail-query
concepts/context-engineering / concepts/lora-low-rank-adaptation / concepts/adapter-merging
concepts/knowledge-distillation / concepts/quantization
patterns/head-cache-plus-tail-finetuned-model / patterns/offline-teacher-online-student-distillation / patterns/teacher-student-model-compression / patterns/prompt-template-library
companies/instacart