CONCEPT Cited by 2 sources

Query understanding¶

Definition¶

Query understanding (QU) is the upstream stage of a search pipeline that turns a raw user query string into structured intent signals downstream retrieval + ranking can consume. Canonical QU sub-tasks:

Query classification — assign the query to one or more taxonomy categories ("butter milk" → Dairy > Milk).
Query rewrites — produce alternative query strings that expand recall: synonyms, substitutes, broader queries.
Semantic role labeling (SRL) — extract structured slots from the query (product, brand, attribute, size, quantity).
Typo/spelling correction, segmentation, language detection, normalization — all upstream of the above.

QU sits between the raw query and the retrieval index. Its quality bounds the quality of everything downstream — a mis-classified query retrieves from the wrong category, a missing rewrite misses recall, a wrong SRL tag breaks filter semantics.

Why QU is hard¶

Instacart's 2025-11-13 Intent Engine post enumerates the structural difficulties (Source: sources/2025-11-13-instacart-building-the-intent-engine):

Broad queries. "healthy food", "frozen snacks" — span dozens of categories; hard to act on.
No direct feedback. QU is upstream of clicks/conversions. The nearest labelled signal (user searched X, purchased Y) is noisy — a user can search "bread" and buy "bananas".
Tail queries. "red hot chili pepper spice" or "2% reduced-fat ultra-pasteurized chocolate milk" appear rarely or never in history; engagement-driven models have no data to learn from.
System complexity. Separate models for each sub-task — each with its own data pipeline, training infra, serving infra — amplify maintenance cost and prevent shared improvements.

Why LLMs help¶

From the same post: LLMs bring world knowledge and linguistic inference to QU — "an LLM already understands that 'Italian parsley' is a synonym for 'flat parsley', while 'curly parsley' is a common substitute" — reducing the specialised-dataset burden and allowing one model to serve multiple QU sub-tasks, collapsing the system-complexity problem.

Adaptation levers for LLM-based QU¶

Instacart's explicit hierarchy, least to most invasive (Source: sources/2025-11-13-instacart-building-the-intent-engine):

Prompting — cheap; the LLM sees only the prompt.
Context-engineering (RAG) — inject domain signals into the prompt at inference time (conversion history, catalog, brand embeddings, session context).
Fine-tuning — bake domain expertise into weights (e.g., LoRA on top of an open-weights base).

The three also form a cost ladder: prompting has no training cost, RAG has offline-pipeline cost, fine-tuning has training + serving-hardware cost.

Serving architecture pattern¶

Search traffic is power-law distributed over queries, so production QU systems typically adopt a hybrid head/tail architecture:

Head (common queries) — pre-compute with an expensive offline pipeline; serve from cache.
Tail (rare/new queries) — serve with a real-time fast model (often a distilled student of the offline pipeline's teacher).

Canonical wiki instance: Instacart Intent Engine SRL routing. Cache serves ~98% of queries; real-time model handles ~2%. See patterns/head-cache-plus-tail-finetuned-model.

Caveats¶

QU label quality is bounded by the proxy signal. Conversion-based labels carry user-behaviour noise; click-based labels carry position bias. Neither is ground truth for intent.
QU is deeply category-taxonomy-bound. Rebuilding QU often requires rebuilding the taxonomy too — a hidden dependency that can eat more time than the model changes.
QU failure modes compound downstream. A wrong category → wrong retrieval scope → bad ranking even with a perfect ranker. QU regressions are often diagnosed as ranking regressions.

Seen in¶

sources/2025-11-13-instacart-building-the-intent-engine — canonical wiki reference; LLM-powered three-task QU rebuild at Instacart (category classification + rewrites + SRL) with explicit three-lever adaptation hierarchy.
sources/2025-02-04-yelp-search-query-understanding-with-llms — earliest wiki instance of LLM-powered QU, pre-dating the Instacart ingest by 9 months. Yelp's Yelp Query Understanding covers segmentation + spell correction (fused into one prompt) and review-highlight phrase expansion. Adds three load-bearing concepts the Instacart post doesn't: (a) the explicit "three properties make query understanding a good LLM target" framing (cache-able output, short text, power-law distribution); (b) implicit query location rewrite as a canonical downstream consumer of {location} tags; (c) token-probability of segmentation tags as a continuous-feature signal for downstream ranking. Three-tier cascade (cache → GPT-4o-mini offline-batch → BERT/T5 realtime) adds an intermediate batch-student layer between the two-tier Instacart shape.

concepts/semantic-role-labeling — one canonical QU sub-task
concepts/long-tail-query — the traffic shape that forces hybrid QU architectures
concepts/query-frequency-power-law-caching — the caching primitive that makes LLM-driven QU economically viable
concepts/context-engineering — the middle adaptation lever
concepts/intent-preserving-query-translation — adjacent: translate query while preserving user intent
concepts/query-shape / concepts/query-vs-document-embedding — adjacent retrieval-side concepts
concepts/implicit-query-location-rewrite — Yelp's canonical downstream consumer
concepts/token-probability-as-ranking-signal — Yelp's continuous-feature reuse
concepts/llm-segmentation-over-ner — why LLMs supplant NER for segmentation
patterns/three-phase-llm-productionization — Yelp's canonical lifecycle playbook
systems/instacart-intent-engine / systems/yelp-query-understanding — canonical production instances
companies/instacart / companies/yelp