Skip to content

CONCEPT Cited by 1 source

Content-grounded answer

Definition

A content-grounded answer is an LLM-product discipline in which every answer the system returns must be supported by evidence retrieved from a vetted content source — the model's own training-data knowledge is explicitly disallowed as the basis for an answer, even when that knowledge is plausibly correct. The discipline converts an LLM from an open-domain conversation partner into a domain-scoped, evidence-citing answering machine.

Canonical wiki instance: Yelp's Biz Ask Anything (2026-03-27), where answers must be grounded in the specific business's Yelp content (reviews, photos, menus, Ask-the- Community, structured facts).

Why not "just answer everything"

From the post:

"Even when a question sounds harmless, answering it with a general-purpose LLM invites two problems: Scope drift: The model will cheerfully answer using its own prior knowledge instead of the business's Yelp content. This nudges users to ask more out-of-scope questions, and we lose control over answer quality because it's no longer grounded in our vetted sources. Hallucinations: When the model lacks (or only partially has) relevant knowledge, it still tends to produce a confident answer — filling gaps with fabricated details rather than deferring or asking for clarification." (Source: sources/2026-03-27-yelp-building-biz-ask-anything-from-prototype-to-product)

Content-grounding is the product countermeasure to both failure modes.

Enforcement layers

Grounding is not enforced by a single mechanism — it's enforced at multiple layers so no single failure point can break the discipline:

  1. Inquiry Type classifier (pre-retrieval) — rejects out-of-scope questions so generation never starts on unanswerable intents.
  2. Content-fetching engine scope — restricts the model's context to the specific business's retrieved content.
  3. Prompt-level constraints — system prompt instructs the model to answer only from provided evidence.
  4. Evidence-Relevance LLM-as-judge grader (post-hoc) — scores whether every answer point is supported by cited evidence (RELEVANT / PARTIALLY_RELEVANT / NOT_RELEVANT). See concepts/llm-as-judge.
  5. Citation discipline — answers include evidence citations inline so the user (and auditor) can verify.

Quality framing

Yelp's five-pillar answer quality framework (from the post):

Pillar Grounded-answer aspect
Correctness/Accuracy Answer factually matches the retrieved content
Completeness/Helpfulness Answer actually addresses the question using the evidence
Evidence Relevance/Trustworthiness Answer is supported by evidence that the user can see
Conciseness Grounded answers don't pad with ungrounded context
Structure Skimmable presentation

Tone is a sixth dimension treated separately.

Anti-patterns a content-grounded answer rejects

  • "Confident answer that fills gaps with training-data guesses." — Yelp's vegan-options-at-El-Vez BEFORE example padded with "the menu may not clearly label vegan options in person" and "the restaurant's online menu might list more vegan options than what is available in person" — both ungrounded hedges. AFTER: specific dish names cited from reviews, no hedging.
  • "Reasonable-sounding recommendation using model's prior." — Yosemite hikes BEFORE recommended Mirror Lake Trail (5+ miles, not toddler-suitable) on model prior. AFTER: verified trails against evidence + added distances / durations.
  • "Contradictory or stale facts." — Gary Danko BEFORE quoted menu prices $81 / $99 / $117 from stale evidence; AFTER cites recent-review evidence for $150 / $170 (and calls out the contradiction explicitly).

Completeness without scope drift

Content-grounding doesn't mean refusing to answer when the evidence is thin — it means either answering from available evidence or explicitly acknowledging the gap (a NEEDS_FOLLOW_UP grader label), never hallucinating. Yelp's A Beautiful Day Spa BEFORE answered "Yes, A Beautiful Day Spa offers massages for muscle pains"; AFTER includes specific options (Swedish, hot stone), durations (60-min), pricing (~$55), and extensions (90-min, 2-hour) — all pulled from content.

Tradeoffs / gotchas

  • Evidence-relevance grading is expensive. An LLM-as- judge grader per dimension per sampled answer is not free; Yelp's daily batch cadence is the cost-vs-signal trade-off.
  • No-content-no-answer surface — businesses with limited content can't answer much. Yelp's suggested-question UX explicitly notes "For businesses with limited content (e.g. new businesses) the questions tend to be more generic but will adjust as content increased on the business page over time."
  • User-education challenge — users coming from open-domain LLMs expect the model to opine; grounded- answer systems deliberately decline to, which needs UX affordance (the suggested-question UI is one answer).
  • Adversarial jailbreak risk — users can try to coax the model out of grounding mode; the T&S classifier + prompt-level constraints + evidence-relevance grader form defence-in-depth.
  • Evidence-citation latency — citing evidence adds tokens + serialisation work; the streaming-response architecture hides this cost from the user.

Seen in

Last updated · 476 distilled / 1,218 read