CONCEPT Cited by 1 source

Whole-article retrieval¶

Definition¶

Whole-article retrieval is the RAG design choice where the retrieval unit is the entire article, not a paragraph chunk or a sentence. The unit returned to the LLM as context is a complete article (title + summary + body), even though the embedding signal that found it may have been a narrow segment (a header, a title, a summary).

Canonical wiki disclosure: Yelp's 2026-05-27 LLM-Assisted CS Chatbot post.

"The foundation of our RAG system rests on approximately 370 Support Center articles. Since each article is relatively concise and addresses a specific question, we made a crucial architectural decision: we use whole articles for RAG, rather than splitting them into smaller chunks. Our focus became efficiently finding the right article for the query." (Source: sources/2026-05-27-yelp-beyond-the-menu-tree-how-yelp-built-a-smarter-customer-success-chatbot)

Why it matters¶

The retrieval-unit decision is orthogonal to the embedding-strategy decision but typically conflated in RAG literature. Most RAG prescriptions assume "chunk = retrieval unit" — embed paragraphs, retrieve paragraphs, pass paragraphs to LLM. Whole-article retrieval breaks the conflation:

Strategy	Embedding unit	Retrieval unit	Used by
Paragraph-chunked	Paragraph	Paragraph	Default RAG prescription
Whole-article embedded	Whole article	Whole article	Naive RAG over short docs
Metadata-only-embedded, whole-article retrieval	Metadata segments (title/summary/headers)	Whole article	Yelp CS Chatbot 2026-05-27

Yelp's choice decouples the two axes. The embedding signal is metadata-only (avoids signal dilution); the retrieval unit is whole-article (avoids fragmentation in the LLM context).

When whole-article retrieval is preferable to chunk retrieval¶

Short, self-contained documents. Yelp's Support Center articles are "relatively concise and address a specific question" — the whole article is the natural answer unit; splitting it would fragment a coherent answer.
LLM context budget supports whole-document inclusion. Top-5 articles × ~500 words × ~4 chars/word = ~10K chars = ~2,500 tokens of context. Comfortably within modern LLM context windows.
Article identity is meaningful. The user benefit of "here's article X" is greater than the benefit of "here's the most relevant paragraph of article X" — the LLM can cite, link, and reason over the whole article.
Hyperlink consistency. When the LLM must cite hyperlinks from the source articles (see concepts/llm-hyperlink-hallucination), whole-article retrieval makes the link allowlist trivially the union of hyperlinks in the retrieved articles. Chunk retrieval fragments the allowlist.

Mechanics — over-fetch + threshold + dedupe¶

Whole-article retrieval over multi-segment metadata embeddings requires a deduplication step:

Embed query.
Over-fetch vectors (Yelp: k = max_items_per_article × 5) so that the top-K unique articles can be assembled even when several segments from the same article rank highly.
Apply a similarity threshold to filter low-confidence matches.
Dedupe by article ID — each article appears at most once.
Cap at top-K articles.

The over-fetch factor is necessary specifically because metadata-rich articles (many headers) contribute many segments each; without over-fetch, the top-K vectors might all come from the same article.

Caveats¶

Coverage gap when the answer is mid-paragraph. Whole- article retrieval finds the article that contains the answer in body text only if some metadata signal (title, summary, header) matches the query. Queries asking about body-only details will miss.
Context-budget headroom is finite. Whole-article retrieval at top-5 already fills ~2,500 tokens for short articles; longer articles or higher top-K would push against context limits.
Single-source canonical on the wiki. Yelp's 2026-05-27 post is the wiki's first explicit canonicalisation of whole-article retrieval as a deliberate choice; chunk-based RAG remains the modal prescription in surveyed literature.

Seen in¶

sources/2026-05-27-yelp-beyond-the-menu-tree-how-yelp-built-a-smarter-customer-success-chatbot — canonical: ~370 Support Center articles, top-5 unique articles per query, ~94% recall@5.

concepts/metadata-only-embedding — the embedding-side pair.
concepts/embedding-signal-dilution — the failure mode the paired choice avoids.
concepts/retrieval-augmented-generation — the parent RAG shape.
concepts/vector-similarity-search — the underlying mechanism.
patterns/whole-article-retrieval-via-metadata-segments — the canonical wiki pattern.