Skip to content

Cloudflare AI Search: the search primitive for your agents

Summary

Cloudflare's 2026-04-16 post launches AI Search (formerly AutoRAG) as a plug-and-play managed search primitive for AI agents — hybrid BM25 + vector retrieval on built-in storage, with the ability to dynamically create and destroy search instances at runtime via a new ai_search_namespaces Workers binding. Each instance owns its storage + vector index (built on R2 + Vectorize) so an application can spin one up per agent / per customer / per tenant without redeploying, and query across them in a single call. Hybrid retrieval pipeline is fully configurable (Porter vs trigram tokenizer, AND/OR keyword match, RRF vs max fusion, optional cross-encoder reranking with @cf/baai/bge-reranker-base); metadata boosts layer business logic (recency, priority) on top of relevance. Worked example is a support-agent built on the Agents SDK with one shared product-docs instance + one per-customer instance of resolution history. The post reframes search from infrastructure-to-assemble (vector index + indexing pipeline + keyword index + fusion logic + per-agent setup) to a single primitive.

Key takeaways

  1. Search is a first-class agent primitive, not an app-layer assembly. "Every agent needs search: Coding agents search millions of files across repos, or support agents search customer tickets and internal docs." AI Search's positioning thesis: the underlying problem — get the right information to the model at the right time — is the same across use cases, so it deserves a primitive, not a checklist of vector-index + indexing-pipeline + keyword-index + fusion-logic assembly per app. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)

  2. Hybrid BM25 + vector is the new default. AI Search previously offered vector only; this release adds BM25 as a first-class retrieval engine, running both in parallel and fusing results. "Vector search is great at understanding intent, but it can lose specifics… BM25 finds documents that actually contain 'ERR_CONNECTION_REFUSED' as a term. However, BM25 may miss a page about 'troubleshooting network connections' even though it may be describing the same problem. That's where vector search shines, and why you need both." Matches the industry convergence documented in sources/2025-09-30-mongodb-top-considerations-when-choosing-a-hybrid-search-solution. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)

  3. Runtime-provisioned per-tenant search instances are the unlock for the one-to-one-agent shape. The ai_search_namespaces binding exposes create() / delete() / list() / search() at the namespace level. "The new ai_search_namespaces binding lets you create and delete instances at runtime from your Worker, so you can spin up one per agent, per customer, or per language without redeployment." Each instance is an isolated (storage + vector-index) pair — same economics bet as Durable Objects extended to the search tier. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)

  4. Storage + index come together; no pre-wired R2 bucket required. "New instances come with their own storage and vector index. Upload files directly to an instance via API and they're indexed. No R2 buckets to set up, no external data sources to connect first." The uploadAndPoll() API returns once indexing is complete so the caller can search immediately — collapses the upload → sync-job → indexed → queryable latency budget to a single awaitable operation. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)

  5. Hybrid-search pipeline knobs are all first-class instance config. At instance creation: index_method ({keyword: true, vector: true}), indexing_options.keyword_tokenizer (porter for natural language, trigram for code / partial match), retrieval_options.keyword_match_mode (AND / OR), fusion_method (rrf / max), reranking: true + reranking_model: "@cf/baai/bge-reranker-base". Every option has a sane default when omitted. See concepts/hybrid-retrieval-bm25-vectors, concepts/reciprocal-rank-fusion, concepts/cross-encoder-reranking. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)

  6. Metadata boost layers business logic on top of relevance. ai_search_options.boost_by: [{ field: "timestamp", direction: "desc" }] at query time nudges rankings without mutating the relevance score. "Retrieval gets you relevant results, but relevance alone isn't always enough. For example, in a news search, an article from last week and an article from three years ago might both be semantically relevant to 'election results,' but most users probably want the recent one." Built-in timestamp field on every item + any custom metadata field. See concepts/metadata-boost, patterns/metadata-boost-at-query-time. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)

  7. Cross-instance search: query many indexes, get one ranked list. env.SUPPORT_KB.search({ query, ai_search_options: { instance_ids: ["product-knowledge", "customer-abc123"] } }) fans the query across multiple instances, merges, and returns one ranked list. "The agent doesn't need to know or care that shared docs and customer resolution history live in separate places." Structurally the namespace-level realisation of unified retrieval tool — multiple physical indexes, one query surface. See patterns/cross-index-unified-retrieval. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)

  8. Dogfooded in production: Cloudflare's own blog search is now AI Search. "The search on our blog is now powered by AI Search. Try the magnifying glass icon to the top right." Third instance of the 2026-04 Cloudflare "dogfood the platform as a customer-facing product" recurring shape (after Agent Lee and Project Think). (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)

  9. Support-agent worked example pairs shared + per-customer memory on the same retrieval surface. Namespace support contains product-knowledge (R2-backed, shared across all agents) + customer-abc123 / customer-def456 / customer-ghi789 (built-in storage, per-customer). The search_knowledge_base tool searches both in one call; save_resolution calls instance.items.uploadAndPoll(filename, content) after resolving an issue so "future agents have full context." Canonical wiki instance of agent memory as a per-instance search index rather than a conversation transcript store. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)

  10. API replaces the old env.AI.autorag() binding; compatibility-date-gated backward compat. "The ai_search_namespaces is a new binding… It replaces the previous env.AI.autorag() API, which accessed AI Search through the AI binding. The old bindings will continue to work using Workers compatibility dates." Canonical Cloudflare non-breaking evolution shape — new primitive coexists, old surface kept alive by compat dates. (Source: sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents)

Architecture

AI Search instance anatomy

AI Search instance
├── Built-in storage (managed R2)           ← items.uploadAndPoll("faq.md", content)
├── Built-in vector index (managed Vectorize)
├── BM25 keyword index (configurable tokenizer: porter | trigram)
├── Indexing pipeline (parse → chunk → embed → index, automatic on upload)
├── Query pipeline
│   ├── Vector retrieval (semantic, intent-aware)
│   ├── BM25 retrieval (lexical, exact-term)
│   ├── Fusion (rrf | max)
│   ├── Metadata boost (timestamp + custom fields)
│   └── Optional cross-encoder rerank (@cf/baai/bge-reranker-base)
├── Optional external source (one R2 bucket OR one website, sync schedule)
└── Identity + ACL (per instance, namespace-scoped)

Website crawling (via Browser Run, formerly Browser Rendering) is built-in when a website is the data source — not billed separately during open beta.

Namespace → instances topology

namespace: "support"
├── product-knowledge     (R2 as source, shared across all agents)
├── customer-abc123       (managed storage, per-customer)
├── customer-def456       (managed storage, per-customer)
└── customer-ghi789       (managed storage, per-customer)

The ai_search_namespaces binding on a Worker gives create() / delete() / list() / search() across the namespace. One shared instance for cross-cutting context (product docs) + N ephemeral instances for per-tenant memory is the canonical shape the post teaches.

Hybrid query pipeline

                       query
             ┌───────────┴───────────┐
             ▼                       ▼
        Vector search            BM25 search
       (intent match)         (term match, porter/trigram)
             │                       │
             └───────────┬───────────┘
                 Fusion (rrf | max)
               Metadata boost (timestamp, custom)
           Optional cross-encoder rerank
                   Ranked results

Each stage is configurable at instance creation time; every option has a sane default.

Systems, concepts, and patterns extracted

Systems (new)

Systems (extended)

Concepts (new)

  • concepts/metadata-boost — query-time ranking nudge based on document metadata, distinct from the underlying relevance score.
  • concepts/per-tenant-search-instance — isolated storage + index unit, one per agent / customer / language, created + destroyed at runtime.
  • concepts/unified-storage-and-index — the managed-service property of an AI Search instance that an uploadAndPoll() call both stores the document and indexes it atomically, with no external sync pipeline.
  • concepts/agent-memory — an agent's accumulated per-session / per-tenant context stored as searchable items in a dedicated search index, retrieved via the same primitive as shared knowledge.

Concepts (extended)

Patterns (new)

Patterns (extended)

Operational numbers & limits

Open-beta limits published in-post:

Limit Workers Free Workers Paid
AI Search instances per account 100 5,000
Files per instance 100,000 1M (500K for hybrid search)
Max file size 4 MB 4 MB
Queries per month 20,000 Unlimited
Max pages crawled per day 500 Unlimited

Pricing: free while in open beta; Browser Run (website crawling) folded into AI Search, not billed separately; systems/workers-ai + AI Gateway usage continue to be billed separately. Cloudflare commits to ≥30 days notice + pricing details before billing begins. Goal post-beta is "unified pricing for AI Search as a single service, rather than billing separately for each underlying component."

Existing (pre-release) AI Search / AutoRAG instances continue to work exactly as before — customer-visible R2 buckets, Vectorize indexes, Browser Run usage remain billed as before; a migration path is promised separately.

Caveats

  • Preview / open-beta product. "We'll give at least 30 days notice and communicate pricing details before any billing begins." Production SLAs, availability guarantees, replication topology, cross-region behaviour are not disclosed.
  • No latency / throughput / recall numbers disclosed. Post is an announcement + architectural walkthrough; no p50 / p99, no NDCG / recall@k, no cost-per-query. The hybrid-retrieval claims are qualitative + by reference to standard industry-literature intuitions.
  • Embedding model not named. The vector half is "your data → vector index" but the specific embedding model running inside the managed instance is not disclosed.
  • Indexing pipeline internals opaque. Chunking strategy, max chunk size, chunk overlap, handling of structured formats (tables, code blocks), multilingual support — none specified. The value prop is "upload and trust"; tradeoffs made inside the pipeline are not on display.
  • Cross-encoder reranking model list limited to one published option. @cf/baai/bge-reranker-base; whether other rerankers (bge-reranker-large, cohere-rerank) are pluggable is not stated.
  • Sparse / learned-sparse retrieval not mentioned. BM25 + dense vector only; no SPLADE / ELSER-style sparse-vector retrieval primitive. MongoDB's 2025-09-30 survey (see sources/2025-09-30-mongodb-top-considerations-when-choosing-a-hybrid-search-solution) names sparse vectors as the vector-first camp's lexical story — not adopted here.
  • Dogfood-disclosure minimum. Blog-search-now-on-AI-Search reveal is a pointer, not a deep-dive; no corpus size, no query volume, no before/after.
  • Support-agent example is reference, not production. No customer case, no scale numbers, no incident retrospective. Pattern is prescribed, not measured.
  • No positioning vs competitors. No explicit comparison with Pinecone, Weaviate, Qdrant, Milvus, Atlas Vector Search / Atlas Hybrid Search, Elasticsearch / OpenSearch, pgvector + FTS.

Source

Last updated · 200 distilled / 1,178 read