Skip to content

PATTERN Cited by 1 source

Runtime-provisioned per-tenant search index

Intent

Make a dedicated search index per tenant (agent, customer, session, language, region, …) a runtime-cheap primitive — created on first appearance, destroyed on tenant eviction, configured independently — rather than a deploy-time schema decision. The goal is to collapse the distance between "this tenant needs isolated retrieval" and "this tenant has isolated retrieval" from weeks-of-ops to one API call.

Problem

The default multi-tenant search shape — one shared index with tenant_id as a filter field — is easy to stand up, but under agent workloads it breaks down:

  • Policy-not-structure isolation. One forgotten WHERE tenant_id = … leaks tenant A's data into tenant B's search results.
  • Global index statistics. BM25 avgdl + idf, HNSW graph shape, chunk budgets, and reranker inputs are all influenced by the noisiest tenant's document shape, degrading quality for the others.
  • Blast radius. Reindex, schema change, corruption all hit every tenant at once.
  • Rigid configuration. Tokenizer (porter vs trigram), match mode (AND vs OR), fusion, reranking cannot vary per tenant.
  • Tenant deletion = scan-and-purge. A one-click tenant deletion is structurally impossible; it's a batch job with verification.

Pre-provisioning one index per tenant at deploy time fixes the isolation story but re-introduces a coordination problem — the set of tenants changes over time, re-deploys are expensive, and the cost model of a "provisioned forever" index per customer is untenable for thousands-of-customers or per-session granularity.

The one-to-one agent posture — which Durable Objects made cheap at the actor tier — makes the pre-provisioning approach especially untenable: one search index per durable-object session at tens-of-thousands-concurrent scale is not a deploy-time configuration.

Solution

Expose create() / delete() / list() / search() at the namespace level as a first-class platform primitive. Tenants get their own search index the way they get their own actor: on demand, at first request, cheaply.

Canonical wiki realisation: Cloudflare's ai_search_namespaces binding in the 2026-04-16 AI Search launch.

// wrangler.jsonc
{
  "ai_search_namespaces": [
    { "binding": "SUPPORT_KB", "namespace": "support" }
  ]
}
// In the SupportAgent's onChatMessage:
try {
  await this.env.SUPPORT_KB.create({
    id: `customer-${customerId}`,
    index_method: { keyword: true, vector: true }
  });
} catch { /* instance already exists */ }

Idempotent creation, runtime lifecycle, per-instance configuration. Deletion is env.SUPPORT_KB.delete("customer-abc123") and purges all of that tenant's data.

Structural requirements

For the pattern to be cheap enough to be runtime, the platform must supply:

  1. Atomic instance provisioning — no cluster-edit, no re-deploy, no index-template handshake.
  2. Atomic instance deletion — one call, bounded time, data gone.
  3. Unified storage and index — no external bucket / pipeline to wire up.
  4. Low per-instance base cost — thousands to millions of instances per account has to be viable.
  5. Composable queries across instances (patterns/cross-index-unified-retrieval) — so the per-tenant decomposition doesn't fragment the app.

2026-04-16 AI Search open-beta limits (illustrative of the cost model): 100 instances/account (Free), 5,000 instances/account (Paid) — not per-deploy; runtime-varying.

Canonical shape — support agent (2026-04-16)

namespace: "support"
├── product-knowledge     (shared, R2-backed)
├── customer-abc123       (per-tenant, managed storage)
├── customer-def456       (per-tenant, managed storage)
└── customer-ghi789       (per-tenant, managed storage)
  • Shared instance: product docs, one-for-all, R2 as data source.
  • Per-customer instances: resolution history (agent memory), built-in storage, created on first customer appearance.
  • Cross-instance query fans across product-knowledge + the customer's own index in one call (patterns/cross-index-unified-retrieval).

Consequences

Pros

  • Isolation is structural. Two tenants cannot collide in one result set because their data is in two indexes.
  • Tenant deletion is one call. Data-residency, right-to-be-forgotten, customer-offboarding become trivial.
  • Per-tenant configuration. Tokenizer, match mode, fusion, reranking all vary per tenant if the workload warrants.
  • Per-tenant index statistics. BM25 parameters stay calibrated to the tenant's own corpus.
  • Natural fit for one-to-one agents. Per-agent / per-session memory is structurally supported.

Cons / tradeoffs

  • Cost model must support it. Per-instance base overhead must round to zero, otherwise the fleet cost scales with tenant count.
  • Cross-tenant analytics harder. Aggregate queries across all tenants now require fan-out; single-index shape gave a built-in global view.
  • Discovery + lifecycle policy. Creation-on-demand needs a tenant-ID source of truth; stale-tenant GC needs a policy (time-based, ref-counted).
  • Per-instance parameter drift. Giving each tenant their own tokenizer/match-mode can silently produce divergent ranking behaviours; governance needed.
  • Warm-up. Cold per-tenant instances have no query history for relevance-learning (though hybrid retrieval is less sensitive to that than learning-to-rank).

Seen in

Last updated · 200 distilled / 1,178 read