Skip to content

CONCEPT Cited by 1 source

Query understanding

Definition

Query understanding (QU) is the upstream stage of a search pipeline that turns a raw user query string into structured intent signals downstream retrieval + ranking can consume. Canonical QU sub-tasks:

  • Query classification — assign the query to one or more taxonomy categories ("butter milk"Dairy > Milk).
  • Query rewrites — produce alternative query strings that expand recall: synonyms, substitutes, broader queries.
  • Semantic role labeling (SRL) — extract structured slots from the query (product, brand, attribute, size, quantity).
  • Typo/spelling correction, segmentation, language detection, normalization — all upstream of the above.

QU sits between the raw query and the retrieval index. Its quality bounds the quality of everything downstream — a mis-classified query retrieves from the wrong category, a missing rewrite misses recall, a wrong SRL tag breaks filter semantics.

Why QU is hard

Instacart's 2025-11-13 Intent Engine post enumerates the structural difficulties (Source: sources/2025-11-13-instacart-building-the-intent-engine):

  1. Broad queries. "healthy food", "frozen snacks" — span dozens of categories; hard to act on.
  2. No direct feedback. QU is upstream of clicks/conversions. The nearest labelled signal (user searched X, purchased Y) is noisy — a user can search "bread" and buy "bananas".
  3. Tail queries. "red hot chili pepper spice" or "2% reduced-fat ultra-pasteurized chocolate milk" appear rarely or never in history; engagement-driven models have no data to learn from.
  4. System complexity. Separate models for each sub-task — each with its own data pipeline, training infra, serving infra — amplify maintenance cost and prevent shared improvements.

Why LLMs help

From the same post: LLMs bring world knowledge and linguistic inference to QU — "an LLM already understands that 'Italian parsley' is a synonym for 'flat parsley', while 'curly parsley' is a common substitute" — reducing the specialised-dataset burden and allowing one model to serve multiple QU sub-tasks, collapsing the system-complexity problem.

Adaptation levers for LLM-based QU

Instacart's explicit hierarchy, least to most invasive (Source: sources/2025-11-13-instacart-building-the-intent-engine):

  1. Prompting — cheap; the LLM sees only the prompt.
  2. Context-engineering (RAG) — inject domain signals into the prompt at inference time (conversion history, catalog, brand embeddings, session context).
  3. Fine-tuning — bake domain expertise into weights (e.g., LoRA on top of an open-weights base).

The three also form a cost ladder: prompting has no training cost, RAG has offline-pipeline cost, fine-tuning has training + serving-hardware cost.

Serving architecture pattern

Search traffic is power-law distributed over queries, so production QU systems typically adopt a hybrid head/tail architecture:

  • Head (common queries) — pre-compute with an expensive offline pipeline; serve from cache.
  • Tail (rare/new queries) — serve with a real-time fast model (often a distilled student of the offline pipeline's teacher).

Canonical wiki instance: Instacart Intent Engine SRL routing. Cache serves ~98% of queries; real-time model handles ~2%. See patterns/head-cache-plus-tail-finetuned-model.

Caveats

  • QU label quality is bounded by the proxy signal. Conversion-based labels carry user-behaviour noise; click-based labels carry position bias. Neither is ground truth for intent.
  • QU is deeply category-taxonomy-bound. Rebuilding QU often requires rebuilding the taxonomy too — a hidden dependency that can eat more time than the model changes.
  • QU failure modes compound downstream. A wrong category → wrong retrieval scope → bad ranking even with a perfect ranker. QU regressions are often diagnosed as ranking regressions.

Seen in

Last updated · 319 distilled / 1,201 read