PATTERN Cited by 1 source

Content-use robots.txt extension¶

Extend the robots.txt standard (and Content Signals specification) with a use field that declares how much content a crawler may retain and reshare, moving beyond the binary crawl/don't-crawl model to express a spectrum of reproduction permissions.

Problem¶

Traditional robots.txt only expresses "you may crawl this path" or "you may not." It cannot distinguish between a search engine that indexes and links back (desirable) versus a training crawler that absorbs content permanently (undesirable for many site owners). Site owners want to allow discovery without allowing full reproduction.

Solution¶

Add a use parameter to the Content Signals specification in robots.txt:

User-agent: *
Content-Signal: search=yes,ai-train=no,use=reference
Allow: /

Three levels: - use=immediate — interact, store nothing - use=reference — index, excerpt, link back (default for managed robots.txt) - use=full — summarize and reproduce

Combine with bot taxonomy for composable rules: "allow Search bots up to reference but block any bot requesting full."

Consequences¶

Graduated control: site owners express nuance beyond binary allow/block.
Machine-readable preference: automated systems can parse and respect levels programmatically.
Advisory, not enforcement: like Disallow, the signal is a preference — but platforms like Cloudflare add enforcement by revoking Verified status for violators.
Ecosystem coordination required: value depends on bot operators choosing to respect the signals.

Known Uses¶

Cloudflare Managed Robots.txt — automatically prepends Content-Signal: search=yes,ai-train=no,use=reference for customers who enable managed content signals (Source: sources/2026-07-01-cloudflare-ai-traffic-options)

Seen In¶

sources/2026-07-01-cloudflare-ai-traffic-options