Skip to content

CONCEPT Cited by 1 source

Whole-article retrieval

Definition

Whole-article retrieval is the RAG design choice where the retrieval unit is the entire article, not a paragraph chunk or a sentence. The unit returned to the LLM as context is a complete article (title + summary + body), even though the embedding signal that found it may have been a narrow segment (a header, a title, a summary).

Canonical wiki disclosure: Yelp's 2026-05-27 LLM-Assisted CS Chatbot post.

"The foundation of our RAG system rests on approximately 370 Support Center articles. Since each article is relatively concise and addresses a specific question, we made a crucial architectural decision: we use whole articles for RAG, rather than splitting them into smaller chunks. Our focus became efficiently finding the right article for the query." (Source: sources/2026-05-27-yelp-beyond-the-menu-tree-how-yelp-built-a-smarter-customer-success-chatbot)

Why it matters

The retrieval-unit decision is orthogonal to the embedding-strategy decision but typically conflated in RAG literature. Most RAG prescriptions assume "chunk = retrieval unit" — embed paragraphs, retrieve paragraphs, pass paragraphs to LLM. Whole-article retrieval breaks the conflation:

Strategy Embedding unit Retrieval unit Used by
Paragraph-chunked Paragraph Paragraph Default RAG prescription
Whole-article embedded Whole article Whole article Naive RAG over short docs
Metadata-only-embedded, whole-article retrieval Metadata segments (title/summary/headers) Whole article Yelp CS Chatbot 2026-05-27

Yelp's choice decouples the two axes. The embedding signal is metadata-only (avoids signal dilution); the retrieval unit is whole-article (avoids fragmentation in the LLM context).

When whole-article retrieval is preferable to chunk retrieval

  • Short, self-contained documents. Yelp's Support Center articles are "relatively concise and address a specific question" — the whole article is the natural answer unit; splitting it would fragment a coherent answer.
  • LLM context budget supports whole-document inclusion. Top-5 articles × ~500 words × ~4 chars/word = ~10K chars = ~2,500 tokens of context. Comfortably within modern LLM context windows.
  • Article identity is meaningful. The user benefit of "here's article X" is greater than the benefit of "here's the most relevant paragraph of article X" — the LLM can cite, link, and reason over the whole article.
  • Hyperlink consistency. When the LLM must cite hyperlinks from the source articles (see concepts/llm-hyperlink-hallucination), whole-article retrieval makes the link allowlist trivially the union of hyperlinks in the retrieved articles. Chunk retrieval fragments the allowlist.

Mechanics — over-fetch + threshold + dedupe

Whole-article retrieval over multi-segment metadata embeddings requires a deduplication step:

  1. Embed query.
  2. Over-fetch vectors (Yelp: k = max_items_per_article × 5) so that the top-K unique articles can be assembled even when several segments from the same article rank highly.
  3. Apply a similarity threshold to filter low-confidence matches.
  4. Dedupe by article ID — each article appears at most once.
  5. Cap at top-K articles.

The over-fetch factor is necessary specifically because metadata-rich articles (many headers) contribute many segments each; without over-fetch, the top-K vectors might all come from the same article.

Caveats

  • Coverage gap when the answer is mid-paragraph. Whole- article retrieval finds the article that contains the answer in body text only if some metadata signal (title, summary, header) matches the query. Queries asking about body-only details will miss.
  • Context-budget headroom is finite. Whole-article retrieval at top-5 already fills ~2,500 tokens for short articles; longer articles or higher top-K would push against context limits.
  • Single-source canonical on the wiki. Yelp's 2026-05-27 post is the wiki's first explicit canonicalisation of whole-article retrieval as a deliberate choice; chunk-based RAG remains the modal prescription in surveyed literature.

Seen in

Last updated · 542 distilled / 1,571 read