SYSTEM Cited by 1 source
FACTS Grounding benchmark¶
FACTS Grounding is a Google DeepMind benchmark family for "systematically evaluating the factuality of large language models" — introduced on the DeepMind blog and extended into a benchmark suite. Cited in the 2026-05-28 Google Research I/O 2026 roundup post as the substrate of Google's multi-year factuality research programme (Source: sources/2026-05-28-google-a-new-era-of-innovation-google-research-at-io-2026).
This is a minimum-viable wiki page anchored to the I/O 2026 post's pointers to the FACTS family. Benchmark specification (prompt design, scoring rubric, dataset composition, leaderboard results) lives in the linked DeepMind blog posts and underlying papers, not in this source's raw capture.
Multi-year arc¶
The Google I/O 2026 post frames factuality as a sustained research arc anchored on FACTS:
- 2021 — Q²: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering (Google Research Pubs).
- 2022 — early factuality benchmark (arXiv:2204.04991).
- 2024–2026 — FACTS Grounding + FACTS Benchmark Suite + modality extensions: text-to-image (arXiv:2504.17502), video generation (arXiv:2503.06800), long-context (arXiv:2406.13632), and expressions of uncertainty (arXiv:2505.24858).
The FACTS family is Google's preferred measurement substrate for factuality decisions on Gemini-family models, and a sibling research thread to the latency-decoding work (speculative decoding, speculative cascades) and factuality-decoding work (SLED, factuality decoding).
Seen in¶
- sources/2026-05-28-google-a-new-era-of-innovation-google-research-at-io-2026 — cited as the published-benchmark substrate for Google's multi-year factuality research arc.
Related¶
- concepts/factuality-decoding — sibling research thread that operates on the model rather than the benchmark.
- concepts/llm-hallucination — the failure mode FACTS measures.
- systems/sled — Google Research factuality-decoding method validated against benchmark families like FACTS.
- systems/gemini — the production model family whose factuality this benchmark supports measuring.
- companies/google — operator.