CONCEPT Cited by 1 source
Whole-article retrieval¶
Definition¶
Whole-article retrieval is the RAG design choice where the retrieval unit is the entire article, not a paragraph chunk or a sentence. The unit returned to the LLM as context is a complete article (title + summary + body), even though the embedding signal that found it may have been a narrow segment (a header, a title, a summary).
Canonical wiki disclosure: Yelp's 2026-05-27 LLM-Assisted CS Chatbot post.
"The foundation of our RAG system rests on approximately 370 Support Center articles. Since each article is relatively concise and addresses a specific question, we made a crucial architectural decision: we use whole articles for RAG, rather than splitting them into smaller chunks. Our focus became efficiently finding the right article for the query." (Source: sources/2026-05-27-yelp-beyond-the-menu-tree-how-yelp-built-a-smarter-customer-success-chatbot)
Why it matters¶
The retrieval-unit decision is orthogonal to the embedding-strategy decision but typically conflated in RAG literature. Most RAG prescriptions assume "chunk = retrieval unit" — embed paragraphs, retrieve paragraphs, pass paragraphs to LLM. Whole-article retrieval breaks the conflation:
| Strategy | Embedding unit | Retrieval unit | Used by |
|---|---|---|---|
| Paragraph-chunked | Paragraph | Paragraph | Default RAG prescription |
| Whole-article embedded | Whole article | Whole article | Naive RAG over short docs |
| Metadata-only-embedded, whole-article retrieval | Metadata segments (title/summary/headers) | Whole article | Yelp CS Chatbot 2026-05-27 |
Yelp's choice decouples the two axes. The embedding signal is metadata-only (avoids signal dilution); the retrieval unit is whole-article (avoids fragmentation in the LLM context).
When whole-article retrieval is preferable to chunk retrieval¶
- Short, self-contained documents. Yelp's Support Center articles are "relatively concise and address a specific question" — the whole article is the natural answer unit; splitting it would fragment a coherent answer.
- LLM context budget supports whole-document inclusion. Top-5 articles × ~500 words × ~4 chars/word = ~10K chars = ~2,500 tokens of context. Comfortably within modern LLM context windows.
- Article identity is meaningful. The user benefit of "here's article X" is greater than the benefit of "here's the most relevant paragraph of article X" — the LLM can cite, link, and reason over the whole article.
- Hyperlink consistency. When the LLM must cite hyperlinks from the source articles (see concepts/llm-hyperlink-hallucination), whole-article retrieval makes the link allowlist trivially the union of hyperlinks in the retrieved articles. Chunk retrieval fragments the allowlist.
Mechanics — over-fetch + threshold + dedupe¶
Whole-article retrieval over multi-segment metadata embeddings requires a deduplication step:
- Embed query.
- Over-fetch vectors (Yelp:
k = max_items_per_article × 5) so that the top-K unique articles can be assembled even when several segments from the same article rank highly. - Apply a similarity threshold to filter low-confidence matches.
- Dedupe by article ID — each article appears at most once.
- Cap at top-K articles.
The over-fetch factor is necessary specifically because metadata-rich articles (many headers) contribute many segments each; without over-fetch, the top-K vectors might all come from the same article.
Caveats¶
- Coverage gap when the answer is mid-paragraph. Whole- article retrieval finds the article that contains the answer in body text only if some metadata signal (title, summary, header) matches the query. Queries asking about body-only details will miss.
- Context-budget headroom is finite. Whole-article retrieval at top-5 already fills ~2,500 tokens for short articles; longer articles or higher top-K would push against context limits.
- Single-source canonical on the wiki. Yelp's 2026-05-27 post is the wiki's first explicit canonicalisation of whole-article retrieval as a deliberate choice; chunk-based RAG remains the modal prescription in surveyed literature.
Seen in¶
- sources/2026-05-27-yelp-beyond-the-menu-tree-how-yelp-built-a-smarter-customer-success-chatbot — canonical: ~370 Support Center articles, top-5 unique articles per query, ~94% recall@5.
Related¶
- concepts/metadata-only-embedding — the embedding-side pair.
- concepts/embedding-signal-dilution — the failure mode the paired choice avoids.
- concepts/retrieval-augmented-generation — the parent RAG shape.
- concepts/vector-similarity-search — the underlying mechanism.
- patterns/whole-article-retrieval-via-metadata-segments — the canonical wiki pattern.