PATTERN Cited by 1 source
Parallel retrieval fusion¶
Problem¶
No single retrieval method works best for all queries. The distribution of query shapes looks like:
- "What was the version of library X?" — FTS exact-token match wins.
- "Where did we decide to host the frontend?" — fact-key exact-topic match wins.
- "Who introduced the rate-limit change?" — direct-query vector embedding finds the right memory when both use the same vocabulary.
- "What framework does the user prefer for data fetching?" — HyDE (hypothetical answer embedding) finds memories the direct embedding misses because the question and answer use different vocabulary.
- "Did X ever say 'we should never use Y'?" — raw-message FTS catches verbatim detail the extraction step generalised away.
Picking one of these — or even picking a two-channel hybrid — leaves recall on the table for the queries that channel doesn't fit.
The pattern¶
Run multiple retrieval channels in parallel over the same corpus; merge results via Reciprocal Rank Fusion with channel-specific weights calibrated to the strength of each channel's signal.
query
│
┌─────────────┴─────────────┐
▼ ▼
query analysis embedding of raw query
├── ranked topic keys
├── FTS terms + synonyms
└── HyDE (answer-shaped statement)
│
┌───────┼──────────────────────────────┐
▼ ▼ ▼ ▼ ▼
FTS fact-key raw-msg direct HyDE
(stem) (exact) (FTS vector vector
safety
net)
│
▼
RRF fusion with channel weights
(strongest signal → highest weight;
safety-net channel → low weight)
ties broken by recency
│
▼
top candidates → synthesis model
(deterministic sub-queries like date math
pre-computed and injected as facts)
│
▼
natural-language answer
Structural properties:
- Query analysis in Stage 1 produces multiple artefacts that feed different downstream channels — topic keys (for exact-key lookup), FTS terms + synonyms (for stemmed keyword match), HyDE statement (for answer-shaped vector search). Stage 1 runs concurrently with raw-query embedding.
- Channels run in parallel, not sequentially. The fan-out is the whole point; a miss on one channel doesn't block the others.
- RRF with channel-specific weights. Not all channels carry the same signal strength. Exact-topic-key matches should outvote safety-net raw-message matches even when the raw-message channel ranks its hit higher within its own result list.
- Safety-net channel included with low weight. A raw-message FTS over the original transcript catches verbatim details the extraction pipeline generalised; giving it low weight means it doesn't dominate but still contributes when no other channel fires.
- Ties broken by recency. Newer memories rank above older ones at equal fused score — what-you-said-yesterday outranks what-you-said- last-month when both are otherwise equivalent matches.
- Special-case queries get deterministic handling. Temporal computation ("what did we decide last Tuesday?") is handled via regex + arithmetic before the synthesis LLM sees it; the pre-computed date is injected into the prompt as a fact rather than asking the LLM to do date math.
- Synthesis is one final LLM call that takes the fused candidates and produces a natural-language answer — not a concatenation of hits.
Canonical wiki instance: Cloudflare Agent Memory¶
Agent Memory's recall(query)
pipeline implements exactly this:
Stage 1 — query analysis + embedding (concurrent). The query analyser produces ranked topic keys + FTS terms with synonyms + a HyDE statement. The raw query is embedded in parallel.
Stage 2 — five retrieval channels in parallel:
- FTS with Porter stemming — keyword precision for queries where you know the exact term.
- Exact fact-key lookup — the query maps directly to a known topic key.
- Raw-message FTS — safety net over stored conversation messages, catching verbatim detail the extraction generalised.
- Direct vector search — embedded raw query → ANN over memory vectors.
- HyDE vector search — embedded hypothetical-answer → ANN; catches abstract / multi-hop queries where question and answer use different vocabulary.
Stage 3 — RRF fusion:
"Fact-key matches get the highest weight because an exact topic match is the strongest signal. Full-text search, HyDE vectors, and direct vectors are each weighted based on strength of signal. Finally, raw message matches are also included with low weight as a safety net to identify candidate results the extraction pipeline may have missed. Ties are broken by recency, with newer results ranked higher."
Temporal computation is deterministic, not LLM'd:
"Temporal computation is handled deterministically via regex and arithmetic, not by the LLM. The results are injected into the synthesis prompt as pre-computed facts. Models are unreliable at things like date math, so we don't ask them to do it."
Stage 4 — synthesis takes top candidates and produces a natural- language answer to the original query.
Why fuse, not pick-one¶
Each channel has a query-shape sweet spot; a probabilistic recall graph across all query shapes looks like a Pareto front, not a winner. Running all channels and fusing means:
- No query-class regression. The direct-vector channel still runs even when HyDE is added, so queries where raw-question embedding lands on the target don't regress.
- Recall adds across channels where they disagree, precision holds where they agree (RRF gives co-retrieved items a cumulative boost).
- Tunable without re-architecting. Channel weights are a config dial; a bad HyDE implementation can be downweighted rather than removed, preserving the other four channels.
- Iterable. New channels can be added (e.g. "learned-relevance reranker over top 50") without disturbing existing ones.
Trade-offs¶
| Dimension | Impact |
|---|---|
| Latency | All channels run in parallel → latency = max(channel latency) + fusion + synthesis. Additive LLM call cost in Stage 1 (analyser) + Stage 4 (synthesis). |
| Cost | 5× storage reads, 2 LLM calls per query (analysis + synthesis), HyDE adds a third. Partially mitigated by parallel execution of channels + prompt caching via session affinity. |
| Quality on single-channel queries | Neutral (RRF preserves correctly-ranked channel winners). |
| Quality on multi-hop / abstract queries | Significant lift — HyDE channel catches these. |
| Implementation complexity | N channels + analyser + fusion + synthesis, vs single-channel: each stage is tunable independently. |
Complementary moves on the write path¶
Read-side fusion is one attack on the question-answer vocabulary asymmetry. The symmetric attack on the write side is to prepend anticipated questions to the stored embedding text:
"The embedding text prepends the 3-5 search queries generated during classification to the memory content itself, bridging the gap between how memories are written (declaratively: 'user prefers dark mode') and how they're searched (interrogatively: 'what theme does the user want?')."
Combined: the asymmetry is attacked on both sides — questions become hypothetical answers (HyDE on read), stored facts carry anticipated questions (on write).
Seen in¶
- sources/2026-04-17-cloudflare-agents-that-remember-introducing-agent-memory — canonical wiki instance; five-channel parallel fusion with RRF + weighted-by-signal-strength + recency tiebreak + deterministic date math + LLM synthesis.
Related¶
- concepts/reciprocal-rank-fusion — the fusion algorithm.
- concepts/hybrid-retrieval-bm25-vectors — the broader family of hybrid-retrieval designs this pattern generalises.
- concepts/hyde-embedding — one of the channels.
- concepts/query-vs-document-embedding — the underlying asymmetry this pattern attacks on the read side.
- patterns/multi-stage-extraction-pipeline — the write-side counterpart that attacks the same asymmetry on the write side via query-prepended embeddings.
- patterns/constrained-memory-api — the API surface that hides
this pipeline behind a single
recall(query)call. - systems/cloudflare-agent-memory — canonical realisation.