PATTERN Cited by 1 source
Cross-index unified retrieval¶
Intent¶
Let a caller query across many physically-separate search indexes in a single call, with the platform owning the fan-out, merge, and rank — so the per-tenant / per-domain decomposition of data into distinct indexes doesn't leak into the caller's code.
Problem¶
Once you adopt per-tenant search instances — or split indexes along any dimension (language, geography, ACL boundary, shared-vs-private, time-bucketed, hot-vs-cold) — the naïve query path becomes:
- Fetch from index A.
- Fetch from index B.
- Normalise incompatible scores / rank positions.
- Merge + dedupe.
- Optionally re-rank.
This is "the DIY hybrid-search problem at a different axis" — every caller re-implements the same fan-out + fuse utility, with the same pitfalls (score-scale incompatibility, dedup rules, timeout orchestration, partial-failure policy). And the agent-tool case is especially bad: an LLM presented with "search shared docs" + "search this customer's history" as two tools will regularly pick the wrong one, pick only one when it needed both, or make two sequential calls when one parallel call sufficed — see concepts/tool-selection-accuracy.
Solution¶
Make multi-index search a single platform-owned API. The caller passes a query + an array of index identifiers; the platform fans out, merges, ranks, returns one list.
Canonical wiki realisation: Cloudflare's namespace-level search() in the 2026-04-16 AI Search launch.
const results = await env.SUPPORT_KB.search({
query: "billing error",
ai_search_options: {
instance_ids: ["product-knowledge", "customer-abc123"]
}
});
Cloudflare's framing:
"In the support agent example, product documentation and customer resolution history live in separate instances by design. But when the agent is answering a question, it needs context from both places at once. Without cross-instance search, you'd make two separate calls and merge the results yourself. The namespace binding exposes a
search()method that handles this for you. Pass an array of instance names and get one ranked list back. Results are merged and ranked across instances. The agent doesn't need to know or care that shared docs and customer resolution history live in separate places."
Canonical shape — the agent tool¶
Exposed to the LLM as a single tool, not N per-index tools:
search_knowledge_base: tool({
description: "Search product docs and customer history",
inputSchema: z.object({ query: z.string() }),
execute: async ({ query }) => {
const instances = ["product-knowledge"];
if (customerId) instances.push(`customer-${customerId}`);
return await this.env.SUPPORT_KB.search({
query,
ai_search_options: {
boost_by: [{ field: "timestamp", direction: "desc" }],
instance_ids: instances
}
});
}
})
The LLM sees one retrieval tool. The decomposition into shared + per-customer indexes is a deployment decision, invisible at the agent surface. This is the namespace-level realisation of patterns/unified-retrieval-tool.
Design properties¶
- Caller-specified shape. Which indexes to fan out to is a query-time argument — so a support agent can include the customer's private memory only when the customer is known, or exclude deprecated indexes on admin queries.
- Platform-owned fusion. Per-index scores are reconciled by the engine; reuses the same fusion machinery (RRF, max) that handles per-index hybrid retrieval.
- Composable with other primitives. patterns/metadata-boost-at-query-time applies across the merged list; cross-encoder reranking can re-score the top-K of the merged result set.
Consequences¶
Pros¶
- Tool-surface collapse. One tool per retrieval surface — agent context window pressure drops, concepts/tool-selection-accuracy rises.
- Decomposition is invisible. Splitting / merging indexes at deploy time doesn't break callers.
- Uniform ranking. Scores / ranks are reconciled once by the engine, consistently, with one point of tuning.
Cons / tradeoffs¶
- Opaque fusion. The caller can't trivially introspect which index contributed a given result without per-result metadata.
- Timeout / partial-failure policy is the platform's call. If instance A is slow, does the query wait, return partial, or fail? The caller can't always override.
- Limits on instance count per query. Fan-out at scale implies an upper bound (not disclosed in the 2026-04-16 post for AI Search).
Seen in¶
- sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agents — namespace-level
search()withinstance_idsarray; canonical support-agent example merging shared + per-customer results.
Related¶
- patterns/unified-retrieval-tool — the tool-design generalisation; this pattern is one infrastructure enabler.
- patterns/runtime-provisioned-per-tenant-search-index — the decomposition shape this pattern recomposes.
- patterns/native-hybrid-search-function — the per-index sibling primitive; cross-index + hybrid-in-each-index stack cleanly.
- concepts/agent-memory — canonical caller — shared knowledge + per-tenant memory merged in one call.
- concepts/one-to-one-agent-instance — the economics enabler; cheap per-tenant instances make the multi-index default worth the engineering.
- systems/cloudflare-ai-search — canonical productised realisation.