PATTERN Cited by 2 sources

RAG side-input for structured extraction¶

Intent¶

When an LLM has to perform a structured extraction / tagging / segmentation task on a short input (a search query, a product name), give it auxiliary structured signal alongside the input — typically piped from an existing ML system or catalogue — to disambiguate extraction decisions the input text alone can't resolve.

The RAG is not "retrieve documents → augment prompt" in the classical sense — the retrieved artefact is a structured signal (a list of business names, a list of predicted categories) chosen because it makes a specific downstream extraction ambiguity tractable.

Shape¶

      input (query / product name)
              │
              ▼
  ┌─────────────────────────┐
  │  existing ML signal or  │
  │  catalog lookup         │
  │  (businesses viewed,    │
  │   predicted categories, │
  │   brand embeddings,     │
  │   related SKUs, …)      │
  └───────────┬─────────────┘
              │
              │ inject into prompt
              ▼
  ┌─────────────────────────┐
  │  LLM extraction prompt  │
  │  (input + side-input)   │
  └───────────┬─────────────┘
              │
              ▼
       structured output

The side-input shrinks the LLM's effective hypothesis space before the extraction runs.

Canonical wiki instances¶

Yelp query understanding (2025-02-04)¶

Two side-input instances in the same system (sources/2025-02-04-yelp-search-query-understanding-with-llms):

Segmentation prompt side-input: business names viewed for the query:

1) barber open sunday [Fade Masters, Doug's Barber Shop]
   => {topic} barber {time} open sunday
2) buon cuon [Banh Cuon Tay Ho, Phuong Nga Banh Cuon]
   => {topic} banh cuon [spell corrected - high]

The business-name context tells the LLM whether "Banh Cuon Tay Ho" or "Fade Masters" is a real brand (so the query is a topic / spell-correction) vs. a direct business-name match.

Yelp's explicit framing: "we augment the input query text with the names of businesses that have been viewed for that query. This helps the model learn and distinguish the many facets of business names from common topics, locations, and misspellings. This is highly useful for both segmentation and spell correction (so was another reason for combining the two tasks)."

Review-highlight side-input: top business categories for query:

search: healthy food, categories: [healthmarkets, vegan,
                                   vegetarian, organicstores]
-> healthy food, healthy options, healthy | nutritious,
   organic, low calorie, low carb, low fat, high fiber |
   fresh, plant-based, superfood

The category context tells the LLM what sort of businesses the expansion universe should target — narrows "healthy food" expansions away from e.g. "healthy relationship" (which shouldn't match anything in Yelp's review corpus).

Yelp: "we enhanced the input raw query text with the most relevant business categories with respect to that query (from our in-house predictive model). This helps the LLM to generate more relevant phrases for our needs, especially for searches with a non-obvious topic (like the name of a specific restaurant) or ambiguous searches (like pool - swimming vs billiards)."

Instacart PARSE (2025-08-01)¶

Instacart's PARSE extraction platform exposes the RAG side-input as a configuration field on each attribute: the side-input varies per attribute and is chosen at prompt-design time. Same pattern at a different altitude (attribute extraction over product catalog, not search-query segmentation).

Why it works¶

Classical NLP handles ambiguity via model capacity (bigger model, more training data) and pipeline architecture (separate NER / disambiguation / entity-linking models). LLMs handle ambiguity via prompt grounding: inject the signal that would have been a separate model's output into the prompt, and let the LLM reason over the combination.

This wins three things:

Re-use existing ML signals. You likely already have predicted categories, similar items, related businesses — these are high-quality signals specific to your domain. Piping them into the prompt gets LLM-era benefits without training new models.
Disambiguation without a dedicated stage. One LLM call does segmentation + disambiguation jointly (per task fusion — see concepts/llm-segmentation-over-ner) instead of two cascaded stages.
Cheaper than more capacity. Scaling up to a bigger LLM is expensive per call; adding a side-input is free at inference time once the upstream signal exists.

Implementation notes¶

Pick the side-input that disambiguates the task's known failure modes. For business-name vs. topic ambiguity, business names viewed for the query are load-bearing. For concept-vs-concept ambiguity ("pool" swimming vs. billiards), predicted business categories are load-bearing. The right side-input is task-specific.
Side-inputs are typically a short list. Feeding the entire catalog isn't useful — the LLM has a context window and the ranker-prioritised top-N is almost always sufficient. Yelp's business-name context shows two names per query; the review-highlight category context shows ~4 categories.
Stability matters. If the side-input source is noisy or drifts, the LLM extraction becomes unstable. Whatever ranks the side-input should be stable under small input perturbations.
Side-input can differ per extraction step. In PARSE, each attribute configuration names its own side-input — extracting "brand" might use similar products; extracting "size" might use product-image OCR.

Tradeoffs / gotchas¶

Side-input leakage. If the side-input is noisy (wrong categories predicted; unrelated business names in the viewed-businesses history), the LLM can be misled by the grounding — the side-input's failure modes compound into extraction failure modes.
Circular dependency on other ML systems. The side-input source (predicted categories, viewed-business ranking) is itself an ML system that needs training / evaluation. If that system has a regression, extraction quality regresses transitively.
Side-input availability at pre-compute time. The {query → viewed businesses} mapping requires log-join infrastructure; the batch pre-computation pipeline must include this join step.
Cold-start. New queries have no viewed-businesses history. The extraction either degrades gracefully (falls back to no side-input) or requires a fallback side-input source (e.g. substring-match suggestions).
Prompt-length budget. Side-inputs eat context-window tokens. For short LLM inputs (queries), this is usually fine; for longer inputs, side-input length has to be traded against primary-input length.

Seen in¶

sources/2025-02-04-yelp-search-query-understanding-with-llms — canonical wiki reference (two instances: business-names for segmentation, categories for review highlights).
sources/2025-08-01-instacart-scaling-catalog-attribute-extraction-with-multi-modal-llms — PARSE attribute-extraction instance; side-input configurable per attribute.

concepts/retrieval-augmented-generation — parent concept
concepts/context-engineering — the generalisation into which this pattern fits as a specific mechanism
concepts/query-understanding — the canonical task family
patterns/dynamic-knowledge-injection-prompt — cousin pattern at different altitude (v0's intent-classified-context-injection)
patterns/composite-model-pipeline — the upstream-ML- signal + LLM composition generalised
systems/yelp-query-understanding / systems/instacart-parse — canonical production instances
companies/yelp / companies/instacart