PATTERN Cited by 1 source

Parallel retrieval fusion¶

Problem¶

No single retrieval method works best for all queries. The distribution of query shapes looks like:

"What was the version of library X?" — FTS exact-token match wins.
"Where did we decide to host the frontend?" — fact-key exact-topic match wins.
"Who introduced the rate-limit change?" — direct-query vector embedding finds the right memory when both use the same vocabulary.
"What framework does the user prefer for data fetching?" — HyDE (hypothetical answer embedding) finds memories the direct embedding misses because the question and answer use different vocabulary.
"Did X ever say 'we should never use Y'?" — raw-message FTS catches verbatim detail the extraction step generalised away.

Picking one of these — or even picking a two-channel hybrid — leaves recall on the table for the queries that channel doesn't fit.

The pattern¶

Run multiple retrieval channels in parallel over the same corpus; merge results via Reciprocal Rank Fusion with channel-specific weights calibrated to the strength of each channel's signal.

                       query
                         │
           ┌─────────────┴─────────────┐
           ▼                           ▼
  query analysis              embedding of raw query
  ├── ranked topic keys
  ├── FTS terms + synonyms
  └── HyDE (answer-shaped statement)
           │
   ┌───────┼──────────────────────────────┐
   ▼       ▼         ▼         ▼          ▼
  FTS    fact-key   raw-msg  direct     HyDE
 (stem)  (exact)    (FTS     vector     vector
                    safety
                    net)
   │
   ▼
  RRF fusion with channel weights
  (strongest signal → highest weight;
   safety-net channel → low weight)
   ties broken by recency
   │
   ▼
  top candidates → synthesis model
  (deterministic sub-queries like date math
   pre-computed and injected as facts)
   │
   ▼
  natural-language answer

Structural properties:

Query analysis in Stage 1 produces multiple artefacts that feed different downstream channels — topic keys (for exact-key lookup), FTS terms + synonyms (for stemmed keyword match), HyDE statement (for answer-shaped vector search). Stage 1 runs concurrently with raw-query embedding.
Channels run in parallel, not sequentially. The fan-out is the whole point; a miss on one channel doesn't block the others.
RRF with channel-specific weights. Not all channels carry the same signal strength. Exact-topic-key matches should outvote safety-net raw-message matches even when the raw-message channel ranks its hit higher within its own result list.
Safety-net channel included with low weight. A raw-message FTS over the original transcript catches verbatim details the extraction pipeline generalised; giving it low weight means it doesn't dominate but still contributes when no other channel fires.
Ties broken by recency. Newer memories rank above older ones at equal fused score — what-you-said-yesterday outranks what-you-said- last-month when both are otherwise equivalent matches.
Special-case queries get deterministic handling. Temporal computation ("what did we decide last Tuesday?") is handled via regex + arithmetic before the synthesis LLM sees it; the pre-computed date is injected into the prompt as a fact rather than asking the LLM to do date math.
Synthesis is one final LLM call that takes the fused candidates and produces a natural-language answer — not a concatenation of hits.

Canonical wiki instance: Cloudflare Agent Memory¶

Agent Memory's recall(query) pipeline implements exactly this:

Stage 1 — query analysis + embedding (concurrent). The query analyser produces ranked topic keys + FTS terms with synonyms + a HyDE statement. The raw query is embedded in parallel.

Stage 2 — five retrieval channels in parallel:

FTS with Porter stemming — keyword precision for queries where you know the exact term.
Exact fact-key lookup — the query maps directly to a known topic key.
Raw-message FTS — safety net over stored conversation messages, catching verbatim detail the extraction generalised.
Direct vector search — embedded raw query → ANN over memory vectors.
HyDE vector search — embedded hypothetical-answer → ANN; catches abstract / multi-hop queries where question and answer use different vocabulary.

Stage 3 — RRF fusion:

"Fact-key matches get the highest weight because an exact topic match is the strongest signal. Full-text search, HyDE vectors, and direct vectors are each weighted based on strength of signal. Finally, raw message matches are also included with low weight as a safety net to identify candidate results the extraction pipeline may have missed. Ties are broken by recency, with newer results ranked higher."

— (Cloudflare, 2026-04-17)

Temporal computation is deterministic, not LLM'd:

"Temporal computation is handled deterministically via regex and arithmetic, not by the LLM. The results are injected into the synthesis prompt as pre-computed facts. Models are unreliable at things like date math, so we don't ask them to do it."

— (Cloudflare, 2026-04-17)

Stage 4 — synthesis takes top candidates and produces a natural- language answer to the original query.

Why fuse, not pick-one¶

Each channel has a query-shape sweet spot; a probabilistic recall graph across all query shapes looks like a Pareto front, not a winner. Running all channels and fusing means:

No query-class regression. The direct-vector channel still runs even when HyDE is added, so queries where raw-question embedding lands on the target don't regress.
Recall adds across channels where they disagree, precision holds where they agree (RRF gives co-retrieved items a cumulative boost).
Tunable without re-architecting. Channel weights are a config dial; a bad HyDE implementation can be downweighted rather than removed, preserving the other four channels.
Iterable. New channels can be added (e.g. "learned-relevance reranker over top 50") without disturbing existing ones.

Trade-offs¶

Dimension	Impact
Latency	All channels run in parallel → latency = max(channel latency) + fusion + synthesis. Additive LLM call cost in Stage 1 (analyser) + Stage 4 (synthesis).
Cost	5× storage reads, 2 LLM calls per query (analysis + synthesis), HyDE adds a third. Partially mitigated by parallel execution of channels + prompt caching via session affinity.
Quality on single-channel queries	Neutral (RRF preserves correctly-ranked channel winners).
Quality on multi-hop / abstract queries	Significant lift — HyDE channel catches these.
Implementation complexity	N channels + analyser + fusion + synthesis, vs single-channel: each stage is tunable independently.

Complementary moves on the write path¶

Read-side fusion is one attack on the question-answer vocabulary asymmetry. The symmetric attack on the write side is to prepend anticipated questions to the stored embedding text:

"The embedding text prepends the 3-5 search queries generated during classification to the memory content itself, bridging the gap between how memories are written (declaratively: 'user prefers dark mode') and how they're searched (interrogatively: 'what theme does the user want?')."

— (Cloudflare, 2026-04-17)

Combined: the asymmetry is attacked on both sides — questions become hypothetical answers (HyDE on read), stored facts carry anticipated questions (on write).

Seen in¶

sources/2026-04-17-cloudflare-agents-that-remember-introducing-agent-memory — canonical wiki instance; five-channel parallel fusion with RRF + weighted-by-signal-strength + recency tiebreak + deterministic date math + LLM synthesis.

concepts/reciprocal-rank-fusion — the fusion algorithm.
concepts/hybrid-retrieval-bm25-vectors — the broader family of hybrid-retrieval designs this pattern generalises.
concepts/hyde-embedding — one of the channels.
concepts/query-vs-document-embedding — the underlying asymmetry this pattern attacks on the read side.
patterns/multi-stage-extraction-pipeline — the write-side counterpart that attacks the same asymmetry on the write side via query-prepended embeddings.
patterns/constrained-memory-api — the API surface that hides this pipeline behind a single recall(query) call.
systems/cloudflare-agent-memory — canonical realisation.