Skip to content

SYSTEM Cited by 1 source

Cloudflare AI Search

Overview

AI Search (formerly AutoRAG) is Cloudflare's managed search primitive for AI agents: hybrid BM25 + vector retrieval on built-in storage + vector index, where each search instance is dynamically creatable and destroyable at runtime from a Worker via the new ai_search_namespaces binding.

Cloudflare's positioning in the 2026-04-16 launch post:

"If you're building search yourself, you need a vector index, an indexing pipeline that parses and chunks your documents, and something to keep the index up to date when your data changes. If you also need keyword search, that's a separate index and fusion logic on top. And if each of your agents needs its own searchable context, you're setting all of that up per agent. AI Search is the plug-and-play search primitive you need."

— (Cloudflare, 2026-04-16)

Architectural placement

AI Search sits across four of Cloudflare's existing primitives:

  • R2 — managed storage substrate per instance; optional external R2 bucket as a data source.
  • Vectorize — vector index substrate per instance.
  • Browser Run (formerly Browser Rendering) — built-in website crawler when a website is the data source, now bundled into AI Search rather than billed separately.
  • Workers + Durable Objects — the consumer; ai_search_namespaces binding is invoked from Worker code, typically within a DO-hosted agent like the 2026-04-16 post's SupportAgent worked example.

The instance — (storage, vector-index, BM25-index, indexing-pipeline, query-pipeline, optional-external-source) — is the unit. A namespace groups instances and is the scope at which runtime creation, deletion, listing, and cross-instance search happen.

Core capabilities

1. Hybrid search (new in 2026-04-16 release)

BM25 + vector running in parallel with engine-side fusion. Pre-release AI Search / AutoRAG was vector-only; 2026-04-16 adds BM25 as a first-class engine and exposes the pipeline as configurable instance options.

const instance = await env.AI_SEARCH.create({
  id: "my-instance",
  index_method: { keyword: true, vector: true },
  indexing_options: { keyword_tokenizer: "porter" },  // or "trigram" for code
  retrieval_options: { keyword_match_mode: "or" },    // or "and"
  fusion_method: "rrf",                                // or "max"
  reranking: true,
  reranking_model: "@cf/baai/bge-reranker-base"
});

All options have sane defaults. See concepts/hybrid-retrieval-bm25-vectors for the primitive, concepts/reciprocal-rank-fusion for RRF, concepts/cross-encoder-reranking for the rerank stage.

2. Built-in storage and index

  • instance.items.uploadAndPoll(filename, content, { metadata: { … } }) — upload + index in one awaitable call; returns item.status = "completed" once searchable. See patterns/upload-then-poll-indexing.
  • No R2 bucket to pre-provision; no external data source required. One external source (R2 bucket or website) can optionally be attached alongside the built-in storage, with a sync schedule.

3. Namespace binding — runtime-provisioned instances

// wrangler.jsonc
{
  "ai_search_namespaces": [
    { "binding": "AI_SEARCH", "namespace": "example" }
  ]
}

Surface: env.AI_SEARCH.create(…), env.AI_SEARCH.delete(…), env.AI_SEARCH.list(…), env.AI_SEARCH.search(…), env.AI_SEARCH.get(id) → instance handle for per-instance items.uploadAndPoll() / search().

Replaces the previous env.AI.autorag() API that accessed AI Search via the AI binding; old bindings continue to work through Workers compatibility dates.

Canonical wiki instance of patterns/runtime-provisioned-per-tenant-search-index and the retrieval-tier realisation of concepts/one-to-one-agent-instance.

4. Metadata boost at query time

const results = await instance.search({
  query: "deployment guide",
  ai_search_options: {
    boost_by: [{ field: "timestamp", direction: "desc" }]
  }
});

timestamp is built into every item; any custom metadata field (priority, region, language, tenant, …) defined at indexing time can also drive a boost. Business logic layered on top of relevance, not fused into it. See concepts/metadata-boost, patterns/metadata-boost-at-query-time.

const results = await env.SUPPORT_KB.search({
  query: "billing error",
  ai_search_options: {
    instance_ids: ["product-knowledge", "customer-abc123"]
  }
});

Merges + ranks across instances in a single call. Namespace-level generalisation of patterns/unified-retrieval-tool; see patterns/cross-index-unified-retrieval.

Canonical usage shape — support agent

The 2026-04-16 post walks through a customer-support agent built on the Agents SDK:

namespace: "support"
├── product-knowledge     (R2 as source, shared across all agents)
├── customer-abc123       (managed storage, per-customer)
├── customer-def456       (managed storage, per-customer)
└── customer-ghi789       (managed storage, per-customer)
  • One shared product-knowledge instance, R2-backed, contains product docs across all agents.
  • One per-customer instance, managed-storage, accumulates past-resolution summaries as agent memory.
  • SupportAgent.onChatMessage creates the per-customer instance on first appearance (idempotent — try { … } catch {}).
  • Two tools exposed to the model:
  • search_knowledge_base — fans across product-knowledge + customer-<id> in one call, with boost_by: timestamp to surface recent docs.
  • save_resolutioninstance.items.uploadAndPoll(filename, content) on resolution, so future agents see it.
  • LLM: Kimi K2.5 via Workers AI (@cf/moonshotai/kimi-k2.5).
  • Durable-object backing for conversation state via AIChatAgent from the Agents SDK.
  • stepCountIs(10) caps agentic tool-use loops.

CLI surface

npx wrangler ai-search create my-search creates an instance; consistent with the cf CLI + unified TypeScript schema rollout (sources/2026-04-13-cloudflare-building-a-cli-for-all-of-cloudflare) — AI Search's ~N operations appear in the same ~3,000-operation surface exposed across CLI / bindings / MCP Code Mode / Terraform / wrangler.jsonc.

Dogfood

"The search on our blog is now powered by AI Search. Try the magnifying glass icon to the top right."

— (Cloudflare, 2026-04-16)

Third instance of the "dogfood the platform as a customer-facing product" recurring shape in April 2026, after Agent Lee and Project Think. See companies/cloudflare.

Open-beta limits (2026-04-16)

Limit Workers Free Workers Paid
AI Search instances per account 100 5,000
Files per instance 100,000 1M (500K for hybrid search)
Max file size 4 MB 4 MB
Queries per month 20,000 Unlimited
Max pages crawled per day 500 Unlimited

Pricing: free during open beta; Browser Run website crawling bundled in (not separately billed); Workers AI + AI Gateway still billed separately. Goal post-beta: "unified pricing for AI Search as a single service."

Pre-release instances

Instances created before 2026-04-16 continue to work — customer-visible R2 buckets, Vectorize indexes, Browser Run usage remain billed as before. Migration path promised.

Caveats

  • Preview / open-beta release; SLA, cross-region behaviour, durability guarantees not disclosed.
  • No published latency, throughput, recall, or cost-per-query numbers.
  • Embedding model powering the vector half is not named.
  • Chunking strategy, chunk overlap, structured-format handling inside the indexing pipeline are opaque — the value prop is "upload and trust."
  • Cross-encoder reranker listed with one option (@cf/baai/bge-reranker-base); pluggability unclear.
  • No sparse / learned-sparse retrieval (SPLADE / ELSER style); BM25 + dense only.
  • No explicit competitive positioning vs Pinecone / Weaviate / Qdrant / Atlas Hybrid Search / pgvector+FTS.

Seen in

Last updated · 200 distilled / 1,178 read