Cloudflare AI Search: the search primitive for your agents¶
Summary¶
Cloudflare's 2026-04-16 post launches AI Search (formerly AutoRAG) as a plug-and-play managed search primitive for AI agents — hybrid BM25 + vector retrieval on built-in storage, with the ability to dynamically create and destroy search instances at runtime via a new ai_search_namespaces Workers binding. Each instance owns its storage + vector index (built on R2 + Vectorize) so an application can spin one up per agent / per customer / per tenant without redeploying, and query across them in a single call. Hybrid retrieval pipeline is fully configurable (Porter vs trigram tokenizer, AND/OR keyword match, RRF vs max fusion, optional cross-encoder reranking with @cf/baai/bge-reranker-base); metadata boosts layer business logic (recency, priority) on top of relevance. Worked example is a support-agent built on the Agents SDK with one shared product-docs instance + one per-customer instance of resolution history. The post reframes search from infrastructure-to-assemble (vector index + indexing pipeline + keyword index + fusion logic + per-agent setup) to a single primitive.
Key takeaways¶
-
Search is a first-class agent primitive, not an app-layer assembly. "Every agent needs search: Coding agents search millions of files across repos, or support agents search customer tickets and internal docs." AI Search's positioning thesis: the underlying problem — get the right information to the model at the right time — is the same across use cases, so it deserves a primitive, not a checklist of vector-index + indexing-pipeline + keyword-index + fusion-logic assembly per app. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)
-
Hybrid BM25 + vector is the new default. AI Search previously offered vector only; this release adds BM25 as a first-class retrieval engine, running both in parallel and fusing results. "Vector search is great at understanding intent, but it can lose specifics… BM25 finds documents that actually contain 'ERR_CONNECTION_REFUSED' as a term. However, BM25 may miss a page about 'troubleshooting network connections' even though it may be describing the same problem. That's where vector search shines, and why you need both." Matches the industry convergence documented in sources/2025-09-30-mongodb-top-considerations-when-choosing-a-hybrid-search-solution. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)
-
Runtime-provisioned per-tenant search instances are the unlock for the one-to-one-agent shape. The
ai_search_namespacesbinding exposescreate()/delete()/list()/search()at the namespace level. "The newai_search_namespacesbinding lets you create and delete instances at runtime from your Worker, so you can spin up one per agent, per customer, or per language without redeployment." Each instance is an isolated(storage + vector-index)pair — same economics bet as Durable Objects extended to the search tier. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents) -
Storage + index come together; no pre-wired R2 bucket required. "New instances come with their own storage and vector index. Upload files directly to an instance via API and they're indexed. No R2 buckets to set up, no external data sources to connect first." The
uploadAndPoll()API returns once indexing is complete so the caller can search immediately — collapses the upload → sync-job → indexed → queryable latency budget to a single awaitable operation. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents) -
Hybrid-search pipeline knobs are all first-class instance config. At instance creation:
index_method({keyword: true, vector: true}),indexing_options.keyword_tokenizer(porterfor natural language,trigramfor code / partial match),retrieval_options.keyword_match_mode(AND/OR),fusion_method(rrf/max),reranking: true+reranking_model: "@cf/baai/bge-reranker-base". Every option has a sane default when omitted. See concepts/hybrid-retrieval-bm25-vectors, concepts/reciprocal-rank-fusion, concepts/cross-encoder-reranking. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents) -
Metadata boost layers business logic on top of relevance.
ai_search_options.boost_by: [{ field: "timestamp", direction: "desc" }]at query time nudges rankings without mutating the relevance score. "Retrieval gets you relevant results, but relevance alone isn't always enough. For example, in a news search, an article from last week and an article from three years ago might both be semantically relevant to 'election results,' but most users probably want the recent one." Built-intimestampfield on every item + any custom metadata field. See concepts/metadata-boost, patterns/metadata-boost-at-query-time. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents) -
Cross-instance search: query many indexes, get one ranked list.
env.SUPPORT_KB.search({ query, ai_search_options: { instance_ids: ["product-knowledge", "customer-abc123"] } })fans the query across multiple instances, merges, and returns one ranked list. "The agent doesn't need to know or care that shared docs and customer resolution history live in separate places." Structurally the namespace-level realisation of unified retrieval tool — multiple physical indexes, one query surface. See patterns/cross-index-unified-retrieval. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents) -
Dogfooded in production: Cloudflare's own blog search is now AI Search. "The search on our blog is now powered by AI Search. Try the magnifying glass icon to the top right." Third instance of the 2026-04 Cloudflare "dogfood the platform as a customer-facing product" recurring shape (after Agent Lee and Project Think). (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)
-
Support-agent worked example pairs shared + per-customer memory on the same retrieval surface. Namespace
supportcontainsproduct-knowledge(R2-backed, shared across all agents) +customer-abc123/customer-def456/customer-ghi789(built-in storage, per-customer). Thesearch_knowledge_basetool searches both in one call;save_resolutioncallsinstance.items.uploadAndPoll(filename, content)after resolving an issue so "future agents have full context." Canonical wiki instance of agent memory as a per-instance search index rather than a conversation transcript store. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents) -
API replaces the old
env.AI.autorag()binding; compatibility-date-gated backward compat. "Theai_search_namespacesis a new binding… It replaces the previousenv.AI.autorag()API, which accessed AI Search through theAIbinding. The old bindings will continue to work using Workers compatibility dates." Canonical Cloudflare non-breaking evolution shape — new primitive coexists, old surface kept alive by compat dates. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)
Architecture¶
AI Search instance anatomy¶
AI Search instance
├── Built-in storage (managed R2) ← items.uploadAndPoll("faq.md", content)
├── Built-in vector index (managed Vectorize)
├── BM25 keyword index (configurable tokenizer: porter | trigram)
├── Indexing pipeline (parse → chunk → embed → index, automatic on upload)
├── Query pipeline
│ ├── Vector retrieval (semantic, intent-aware)
│ ├── BM25 retrieval (lexical, exact-term)
│ ├── Fusion (rrf | max)
│ ├── Metadata boost (timestamp + custom fields)
│ └── Optional cross-encoder rerank (@cf/baai/bge-reranker-base)
├── Optional external source (one R2 bucket OR one website, sync schedule)
└── Identity + ACL (per instance, namespace-scoped)
Website crawling (via Browser Run, formerly Browser Rendering) is built-in when a website is the data source — not billed separately during open beta.
Namespace → instances topology¶
namespace: "support"
├── product-knowledge (R2 as source, shared across all agents)
├── customer-abc123 (managed storage, per-customer)
├── customer-def456 (managed storage, per-customer)
└── customer-ghi789 (managed storage, per-customer)
The ai_search_namespaces binding on a Worker gives create() / delete() / list() / search() across the namespace. One shared instance for cross-cutting context (product docs) + N ephemeral instances for per-tenant memory is the canonical shape the post teaches.
Hybrid query pipeline¶
query
│
┌───────────┴───────────┐
▼ ▼
Vector search BM25 search
(intent match) (term match, porter/trigram)
│ │
└───────────┬───────────┘
▼
Fusion (rrf | max)
│
▼
Metadata boost (timestamp, custom)
│
▼
Optional cross-encoder rerank
│
▼
Ranked results
Each stage is configurable at instance creation time; every option has a sane default.
Systems, concepts, and patterns extracted¶
Systems (new)¶
- systems/cloudflare-ai-search — plug-and-play hybrid search primitive; built-in storage + vector index, runtime-provisioned per-instance, namespace-level
search(). - systems/cloudflare-vectorize — Cloudflare's managed vector database; the vector-index substrate behind AI Search instances.
- systems/cloudflare-agents-sdk — the SDK +
AIChatAgent/routeAgentRequestprimitives around which the support-agent example is built. - systems/kimi-k2-5 — the LLM used in the worked example, served via systems/workers-ai as
@cf/moonshotai/kimi-k2.5. - systems/workers-ai — Cloudflare's model-inference platform (the
@cf/…namespace); hosts both the chat model and the reranker.
Systems (extended)¶
- systems/cloudflare-r2 — role as AI-Search-instance-storage substrate; canonical data source for the shared
product-knowledgeinstance. - systems/cloudflare-durable-objects — the
AIChatAgent/SupportAgentclass extends a DO; DO is where the conversation history + per-customer identity lives. - systems/cloudflare-workers — bindings (
ai_search_namespaces,ai,durable_objects) as the contract layer. - systems/cloudflare-browser-rendering — Browser Run now built-in for website-as-data-source crawling.
- systems/bm25 — AI Search formalises BM25 as the lexical retrieval half, with Porter vs trigram tokenizer as a first-class config knob.
Concepts (new)¶
- concepts/metadata-boost — query-time ranking nudge based on document metadata, distinct from the underlying relevance score.
- concepts/per-tenant-search-instance — isolated storage + index unit, one per agent / customer / language, created + destroyed at runtime.
- concepts/unified-storage-and-index — the managed-service property of an AI Search instance that an
uploadAndPoll()call both stores the document and indexes it atomically, with no external sync pipeline. - concepts/agent-memory — an agent's accumulated per-session / per-tenant context stored as searchable items in a dedicated search index, retrieved via the same primitive as shared knowledge.
Concepts (extended)¶
- concepts/hybrid-retrieval-bm25-vectors — AI Search as another productised instance of BM25 + vector in parallel with fusion.
- concepts/reciprocal-rank-fusion —
fusion_method: "rrf"as the default; contrasted withmaxfusion. - concepts/cross-encoder-reranking —
reranking: true+reranking_model: "@cf/baai/bge-reranker-base"as a first-class instance option. - concepts/vector-similarity-search / concepts/vector-embedding — embedded into the managed instance; developer no longer touches Vectorize directly for the AI-Search workflow.
- concepts/one-to-one-agent-instance — runtime-provisioned per-tenant search instances are the retrieval-tier realisation of the same shape DO gives at the actor tier.
Patterns (new)¶
- patterns/runtime-provisioned-per-tenant-search-index — spin up an isolated search index per agent / customer / language at runtime, destroy on demand; namespace-level bindings rather than deploy-time wiring.
- patterns/cross-index-unified-retrieval — fan a query across N named instances in a single call, get one merged ranked list; the retrieval-tier generalisation of patterns/unified-retrieval-tool.
- patterns/metadata-boost-at-query-time — layer business-logic ranking (recency, priority, region) on top of the relevance score via per-query
boost_byclauses on document metadata fields. - patterns/upload-then-poll-indexing — single
uploadAndPoll(filename, content)API that returns once the item is indexed, collapsing the write-then-sync-then-searchable latency into one awaitable operation.
Patterns (extended)¶
- patterns/native-hybrid-search-function — AI Search joins Atlas Hybrid Search as another productised instance; same industry thesis documented by MongoDB in sources/2025-09-30-mongodb-top-considerations-when-choosing-a-hybrid-search-solution.
- patterns/unified-retrieval-tool —
search_knowledge_basetool in the support-agent example wraps both instances behind one agent-visible tool.
Operational numbers & limits¶
Open-beta limits published in-post:
| Limit | Workers Free | Workers Paid |
|---|---|---|
| AI Search instances per account | 100 | 5,000 |
| Files per instance | 100,000 | 1M (500K for hybrid search) |
| Max file size | 4 MB | 4 MB |
| Queries per month | 20,000 | Unlimited |
| Max pages crawled per day | 500 | Unlimited |
Pricing: free while in open beta; Browser Run (website crawling) folded into AI Search, not billed separately; systems/workers-ai + AI Gateway usage continue to be billed separately. Cloudflare commits to ≥30 days notice + pricing details before billing begins. Goal post-beta is "unified pricing for AI Search as a single service, rather than billing separately for each underlying component."
Existing (pre-release) AI Search / AutoRAG instances continue to work exactly as before — customer-visible R2 buckets, Vectorize indexes, Browser Run usage remain billed as before; a migration path is promised separately.
Caveats¶
- Preview / open-beta product. "We'll give at least 30 days notice and communicate pricing details before any billing begins." Production SLAs, availability guarantees, replication topology, cross-region behaviour are not disclosed.
- No latency / throughput / recall numbers disclosed. Post is an announcement + architectural walkthrough; no p50 / p99, no NDCG / recall@k, no cost-per-query. The hybrid-retrieval claims are qualitative + by reference to standard industry-literature intuitions.
- Embedding model not named. The vector half is "your data → vector index" but the specific embedding model running inside the managed instance is not disclosed.
- Indexing pipeline internals opaque. Chunking strategy, max chunk size, chunk overlap, handling of structured formats (tables, code blocks), multilingual support — none specified. The value prop is "upload and trust"; tradeoffs made inside the pipeline are not on display.
- Cross-encoder reranking model list limited to one published option.
@cf/baai/bge-reranker-base; whether other rerankers (bge-reranker-large, cohere-rerank) are pluggable is not stated. - Sparse / learned-sparse retrieval not mentioned. BM25 + dense vector only; no SPLADE / ELSER-style sparse-vector retrieval primitive. MongoDB's 2025-09-30 survey (see sources/2025-09-30-mongodb-top-considerations-when-choosing-a-hybrid-search-solution) names sparse vectors as the vector-first camp's lexical story — not adopted here.
- Dogfood-disclosure minimum. Blog-search-now-on-AI-Search reveal is a pointer, not a deep-dive; no corpus size, no query volume, no before/after.
- Support-agent example is reference, not production. No customer case, no scale numbers, no incident retrospective. Pattern is prescribed, not measured.
- No positioning vs competitors. No explicit comparison with Pinecone, Weaviate, Qdrant, Milvus, Atlas Vector Search / Atlas Hybrid Search, Elasticsearch / OpenSearch, pgvector + FTS.
Source¶
- Original: https://blog.cloudflare.com/ai-search-agent-primitive/
- Raw markdown:
raw/cloudflare/2026-04-16-ai-search-the-search-primitive-for-your-agents-3300b9ee.md
Related¶
- sources/2025-09-30-mongodb-top-considerations-when-choosing-a-hybrid-search-solution — industry-evolution survey of hybrid search; AI Search is another productisation of the same thesis.
- sources/2026-04-15-cloudflare-project-think-building-the-next-generation-of-ai-agents — the long-running-agent substrate this retrieval primitive is meant to live alongside.
- sources/2026-04-15-cloudflare-introducing-agent-lee — same-week Cloudflare first-party agent launch (Code Mode + elicitation gate + dynamic UI); AI Search is the search tier of the same stack.
- sources/2026-04-13-cloudflare-building-a-cli-for-all-of-cloudflare —
cfCLI + unified TypeScript schema; AI Search is exposed as part of this ~3,000-operation API surface (npx wrangler ai-search create my-search). - systems/atlas-hybrid-search — MongoDB's sibling productised hybrid-search primitive (lexical-first vendor equivalent).
- concepts/hybrid-retrieval-bm25-vectors / concepts/reciprocal-rank-fusion / concepts/cross-encoder-reranking — the retrieval primitives this product formalises into instance config.
- patterns/native-hybrid-search-function — industry-level productisation trend AI Search joins.