CONCEPT Cited by 1 source
Content Signals¶
Definition¶
Content Signals is a proposed extension to
robots.txt that lets a site declare
what AI systems may do with its content, separately for
three orthogonal dimensions:
ai-train— may the content be used to train AI models?ai-input— may it be used as AI input (inference, RAG, grounding, retrieval)?search— may it appear in search results?
Each takes the value yes or no. Declared via a
Content-Signal: directive per user-agent in robots.txt:
Maintained at contentsignals.org.
Why three dimensions¶
Before Content Signals, robots.txt could only say "allow this
crawler" or "deny this crawler" per path. That collapses three
distinct business decisions into one:
- Opt out of training, stay in search is a common publisher stance — you want Google / Bing to drive traffic, but not see your content absorbed into an LLM.
- Allow RAG at inference but not training lets search-engine- embedded AI answer questions citing you (read-only, attributed) without becoming part of the next model.
- Allow training, not inference is rarer but expressible — research corpora that shouldn't serve as live oracle.
Collapsing any of these into allow/deny forces publishers to pick the wrong compromise. Content Signals decomposes the decision.
Adoption (2026-04)¶
4 % of the top 200 k domains have declared any Content-Signal
values in their robots.txt, per Cloudflare Radar. New standard;
momentum noted by Cloudflare as gaining.
Relationship to pay-per-crawl¶
Content Signals is declarative — it expresses publisher preference. Enforcement is a separate layer:
ai-train=nowithout enforcement is advisory; bots that ignore it aren't technically breaking any protocol.- pay-per-crawl is the enforcement
layer that bills AI crawlers for training use, turning
ai-train=nointo "you can use it, but it'll cost you" or "not at any price." - Cloudflare-hosted sites can combine Content Signals as the publisher-preference declaration, WAF as hard-block, pay-per-crawl for monetisable-access, and the bot-management layer for classification.
Seen in¶
- sources/2026-04-17-cloudflare-introducing-the-agent-readiness-score-is-your-site-agent-ready — canonical wiki instance. Checked by the Agent Readiness Score under Access Rules; 4 % adoption measured on Radar.
Related¶
- concepts/robots-txt — host file that Content Signals extends.
- concepts/agent-readiness-score — where Content Signals is graded under Access Rules.
- systems/pay-per-crawl — enforcement / monetisation layer under the publisher-preference declaration.