SYSTEM Cited by 1 source
Yelp Query Understanding¶
Definition¶
Yelp Query Understanding is the LLM-powered query- processing pipeline that sits between the raw user query and Yelp Search's retrieval + ranking backend. It is the system canonicalised by the 2025-02-04 Yelp Engineering post "Search query understanding with LLMs: from ideation to production" (sources/2025-02-04-yelp-search-query-understanding-with-llms).
Tasks covered (as of 2025-02-04)¶
- Query segmentation — assign labels from {topic, name, location, time, question, none} to each token-run of the query. Used downstream for name-matching, location rewrite, and filter auto-enablement.
- Spell correction — fused with segmentation into a single
LLM prompt; spell-corrected segments are meta-tagged with
[spell corrected - high]. - Review-highlight phrase expansion — creative generation of semantically-adjacent phrases so interesting review- snippets can be surfaced alongside each business.
- Canonicalisation — mentioned in passing in the post's preamble; not detailed.
Architectural components (2025-02-04 snapshot)¶
raw query text
│
▼
┌──────────────────────────────┐
│ RAG side-input assembly │
│ (seg: businesses viewed) │
│ (rh: business categories)│
└──────────┬───────────────────┘
│
▼
┌──────────────────────────────────────────────────┐
│ Three-tier cascade (cache → batch → realtime) │
│ │
│ ┌───────────────┐ │
│ │ head cache │ ◀── pre-computed by GPT-4 │
│ │ (expensive │ or fine-tuned GPT-4o-mini │
│ │ LLM output) │ via OpenAI batch API │
│ └──────┬────────┘ │
│ │ hit │
│ miss │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ fine-tuned │ ◀── offline batch coverage │
│ │ GPT-4o-mini │ (95%+ for review highl.) │
│ └──────┬────────┘ │
│ │ hit │
│ miss │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ BERT / T5 │ ◀── realtime long-tail │
│ │ (realtime) │ serving │
│ └──────┬────────┘ │
└──────────┼────────────────────────────────────────┘
│
▼
segmentation phrase expansion …
│ │
▼ ▼
┌───────────┐ ┌───────────────┐
│ Yelp │ │ Review │
│ Search │ │ highlighting │
│ backend │ │ sub-system │
└───────────┘ └───────────────┘
Notes per the post:
- Cache layer: "caching (pre-computing) high-end LLM responses for only head queries above a certain frequency threshold". Generalised as concepts/query-frequency-power-law-caching.
- Batch layer: fine-tuned GPT-4o-mini runs offline via OpenAI batch API calls; review-highlight expansion scaled to 95% of traffic via this path.
- Realtime layer: BERT + T5 models serve the 5% tail that never hits the cache or batch path.
- RAG side-inputs differ per task: segmentation uses "names of businesses that have been viewed for that query"; review-highlight uses "the most relevant business categories with respect to that query (from our in-house predictive model)".
Downstream consumers¶
- Implicit location rewrite — when segmentation tags a
{location}with high confidence, the search-backend geobox is rewritten within 30 miles of the user's search to the refined location. Canonical example: "epcot restaurants" → rewrite geobox from "Orlando, FL" to "Epcot, Bay Lake, FL". See concepts/implicit-query-location-rewrite. - Business-name matching — the
token
probability of the
{name}tag is used as a continuous feature in Yelp's query-to-business-name matching + ranking system. - Auto-enabled filters — not fully detailed in the post, but listed as a downstream benefit of "more intelligent labeling of these tags."
- Review-snippet selection — phrase expansion output feeds the review-highlight sub-system, which picks review passages matching the expanded phrases for display alongside search results.
Build history (per the post's three-phase process)¶
Yelp's canonical three-phase lifecycle — see patterns/three-phase-llm-productionization — applied to both running examples:
- Formulation (GPT-4): decide output schema, merge tasks where possible (segmentation + spell correction fused), decide RAG side-inputs.
- Proof of Concept (head-cache with expensive-LLM output): pre-compute for head queries, wire up cache, offline + online evals. Review-highlight A/B "increased Session / Search CTR across our platforms"; location-rewrite "achieved online metric wins".
- Scaling Up: fine-tune GPT-4o-mini on the GPT-4-generated
- curated golden dataset ("up to a 100x savings in cost"); pre-compute at tens-of-millions scale; deploy BERT/T5 for realtime tail.
Tradeoffs / gotchas¶
- Cache freshness is tied to query-distribution drift. Head queries can change seasonally ("Christmas tree", "superbowl party"); the cache needs a refresh cadence appropriate to the drift rate. The post doesn't disclose Yelp's exact cadence.
- Tail-query quality gap. BERT/T5 realtime output is implicitly lower quality than the offline GPT-4o-mini output; the gap is asserted but not measured in the post.
- Fine-tuned student inherits teacher's biases. The curation step ("isolate sets of inputs that are likely to have been mislabeled and target these for human re-labeling or removal") mitigates but doesn't eliminate this.
- RAG side-input pipeline is a separate dependency. The "businesses viewed for that query" and "predicted categories" are Yelp's own in-house signals; a user without a comparable signal infrastructure cannot trivially replicate the pattern.
- Prompt-caching not mentioned. Yelp caches output, not prompt prefixes — a potential future cost-reduction axis.
Seen in¶
- sources/2025-02-04-yelp-search-query-understanding-with-llms — canonical reference; first and only public disclosure on the wiki.
Related¶
- systems/yelp-search — parent production context
- systems/gpt-4 / systems/gpt-4o-mini / systems/bert / systems/t5 — the model zoo
- concepts/query-understanding — upstream concept
- concepts/query-frequency-power-law-caching — the infrastructure primitive
- concepts/implicit-query-location-rewrite — downstream consumer
- concepts/review-highlight-phrase-expansion — the generative-task family
- concepts/token-probability-as-ranking-signal — continuous- feature reuse
- concepts/retrieval-augmented-generation — with two side- input instances
- concepts/llm-cascade — related cost-routing pattern
- patterns/three-phase-llm-productionization — Yelp's playbook
- patterns/head-cache-plus-tail-finetuned-model — the shape
- patterns/offline-teacher-online-student-distillation — the training half of the shape
- patterns/rag-side-input-for-structured-extraction — the RAG role
- companies/yelp