Skip to content

CONCEPT Cited by 2 sources

Web-search telephone game

Definition

The web-search telephone game is the failure mode in LLM pipelines that use web-search RAG where a smaller summariser model sits between the raw web-search results and the parent model consuming them. The smaller model introduces errors on every hop: hallucinated facts, misquotes, silent omissions — and the parent model then treats the summariser's output as ground truth, compounding the error with its own generation.

Canonical Vercel framing

"Many agents rely on web search tools for ingesting new information. Web search is great (v0 uses it too), but it has its faults. You may get back old search results, like outdated blog posts and documentation. Further, many agents have a smaller model summarize the results of web search, which in turn becomes a bad game of telephone between the small model and parent model. The small model may hallucinate, misquote something, or omit important information."

(Source: sources/2026-01-08-vercel-how-we-made-v0-an-effective-coding-agent)

Why the analogy is exact

Telephone (the children's game) describes a chain of serial lossy retransmissions; the received message drifts from the sent message proportional to the number of hops. Web-search RAG with a summariser has three hops:

  1. Authoritative source → web-search result snippet (already lossy: snippets truncate, ranking may mis-prioritise).
  2. Search-result snippet → summariser-model output (lossy: summariser can hallucinate, misquote, omit).
  3. Summariser output → parent-model output (lossy: parent may misinterpret or compose incorrectly).

Each hop is independently lossy; compounded loss dominates.

The two independent failure modes Vercel names

  1. Stale search results. "You may get back old search results, like outdated blog posts and documentation." Even if the summariser is perfect, the source can be stale — web-search indexes lag publication, and SEO dynamics favour older content with more backlinks.

  2. Summariser as lossy translator. "The small model may hallucinate, misquote something, or omit important information." The summariser is usually a smaller/cheaper model than the parent (cost-driven choice), which exacerbates the hallucination rate exactly where the pipeline most needs accuracy.

The Vercel mitigation: bypass the game

Vercel's preferred alternative is direct structured injection into the parent model's system prompt, skipping both the search and the summariser:

  • Detect intent with embeddings + keyword matching.
  • Inject version-pinned library knowledge directly into the system prompt.
  • For code examples: point the model at a hand-curated read-only filesystem of LLM-consumption-optimised samples (patterns/read-only-curated-example-filesystem).

Framed as: "we keep this injection consistent to maximize prompt-cache hits and keep token usage low" — a secondary benefit beyond correctness.

When web-search RAG is still warranted

Vercel explicitly notes "v0 uses [web search] too" — the telephone-game critique is not an argument against web- search RAG in all cases, only against using it as the primary mechanism for library-API accuracy where a hand-curated injection is feasible.

Web-search RAG remains the right answer when:

  • The target knowledge is too dynamic to curate ahead of time (current events, market data).
  • The target knowledge has no single canonical source (user-generated content, forums).
  • The target knowledge is unbounded in shape (open- domain questions).

Seen in

  • sources/2026-01-08-vercel-how-we-made-v0-an-effective-coding-agent — canonical disclosure; "bad game of telephone" phrasing; v0's preference for structured prompt injection over web-search RAG for library-API accuracy.
  • sources/2026-04-21-vercel-build-knowledge-agents-without-embeddingsretrieval-altitude sibling. Vercel's 2026-04-21 Knowledge Agent Template names the same failure class inside the vector-DB retrieval pipeline: chunking-boundary + embedding-model + similarity- threshold is three composed transformations between question and answer, each of which can corrupt the retrieval silently. The architectural response is identical in shape (remove the summarisation transformation) but instantiated differently: filesystem search + bash tools instead of direct prompt injection. Canonicalised as concepts/embedding-black-box-debugging — the retrieval-pipeline dual of the web-search telephone game.
Last updated · 476 distilled / 1,218 read