Skip to content

CONCEPT Cited by 1 source

Review-highlight phrase expansion

Definition

Review-highlight phrase expansion is the LLM-driven task of generating a creatively-expanded list of phrases that are semantically adjacent to a user's search query, so that matching review snippets can be highlighted in search- result cards.

The mechanism: when a user searches "healthy food", a set of display-time candidate phrases ("healthy options", "nutritious", "organic", "low calorie", "low carb", "low fat", "high fiber", "fresh", "plant-based", "superfood"…) is generated offline from the LLM, then used at render time to find actual user-written review passages that match one or more of the expanded phrases. The matched passage is bolded in the rendered snippet.

It's a generative task, not an extraction task — the output phrases are new text the LLM invents, constrained by semantic adjacency to the query.

Canonical wiki instance: Yelp

Canonical reference: Yelp's 2025-02-04 post (sources/2025-02-04-yelp-search-query-understanding-with-llms). The post provides the task's prompt-iteration timeline on the query "healthy food":

Date Output
May 2022 healthy food, healthy, organic (3 phrases)
March 2023 healthy food, healthy, organic, low calorie, low carb (5 phrases)
September 2023 healthy food, healthy options, healthy \| nutritious, organic, low calorie, low carb, low fat, high fiber \| fresh, plant-based, superfood (11 phrases, structured tiers)
December 2023 (with RAG) same 11 phrases, grounded by RAG business-category context [healthmarkets, vegan, vegetarian, organicstores]

This evolution illustrates three things about the task: (1) expansion scope grows as the prompt is iterated; (2) output gains structure (tiers of aggressiveness) as the engineering team learns to care about phrase relevance; (3) RAG grounding is added late, without changing the output schema, because RAG disambiguates but doesn't change the task.

Why expansion is non-trivial

From the Yelp post's worked list (paraphrased):

  1. Brand-relative semantics. What does the query mean in the context of Yelp (reservations, pickups, Yelp- guaranteed service searches, etc.) — not all generic dictionaries apply.
  2. Go wider than the literal query. "Seafood" shouldn't only highlight "seafood" — also "fresh fish", "fresh catch", "salmon roe", "shrimp". Wider matching increases the chance that a real review passage matches something.
  3. Go up the semantic tree when appropriate. "Vegan burritos" expands to "vegan", "vegan options", etc. — broader concepts when the literal query produces few matches.
  4. Multi-word / casual phrases. "Watch the game" is a valid expansion for "best bar to watch Lions games". Single-word dictionaries can't generate these.
  5. Spurious-match avoidance. The expansion should bias away from producing phrases that would spuriously match many reviews (stop-words, generic phrases).

Relationship to query expansion in classical IR

Query expansion in classical IR adds synonyms to the retrieval-side query to improve recall. Review-highlight phrase expansion is a cousin at a different altitude: the expansion is not used for retrieval (the results set is already determined), but for display-time matching of review passages that get highlighted in each result card. The phrases don't change what businesses show up; they change what text from each business's reviews gets bolded.

A secondary Yelp use of the expansion output described in the post: CTR of expanded-phrase matches is fed back into the ranking model — a match on "plant-based" being clicked more often than a match on "organic" is a signal the ranker can exploit.

Serving architecture

Yelp's production serving for review-highlight phrase expansion is a canonical instance of head-cache-plus- tail: 95% of traffic served from pre-computed expansions via OpenAI batch API, 5% served by averaging expanded-phrase CTR over business categories as a fallback heuristic.

Tradeoffs / gotchas

  • Offline evaluation is subjective. "Offline evaluation of the quality of generated phrases is subjective and requires very strong human annotators with good product, qualitative, and engineering understanding." Unlike extraction tasks, phrase-expansion quality can't be measured via F1 or accuracy on a ground-truth set — there's no single "correct" expansion.
  • Expansion breadth vs. spurious-match risk. A very aggressive expansion is likely to hit many reviews spuriously (highlighting "fresh" on every restaurant review); a very conservative expansion is likely to hit few reviews. The tuning knob is implicit in the prompt's example set and requires A/B testing to find.
  • Generated phrases shouldn't be LLM hallucinations. The LLM can generate phrases that look reasonable but don't actually appear in any review — in which case no snippet gets highlighted, and the expansion is wasted effort. The canonical mitigation: curate examples whose output phrases are empirically common in real reviews.
  • Freshness and seasonality. Query meanings drift with product catalog changes (new menu items, new services); the cached expansions need a refresh cadence aligned to drift.

Seen in

Last updated · 476 distilled / 1,218 read