Skip to content

CONCEPT Cited by 1 source

HyDE (Hypothetical Document Embedding)

Definition

HyDE (Hypothetical Document Embedding) is a retrieval technique where, instead of embedding the user's question and searching for documents similar to the question, the retrieval layer first has an LLM generate a hypothetical answer — a declarative statement phrased as if it were the correct answer — and then embeds that hypothetical answer. The ANN search finds documents whose embeddings are close to the answer shape, not the question shape.

Introduced in Gao et al. 2022, "Precise Zero-Shot Dense Retrieval without Relevance Labels". Load-bearing insight: in an embedding space trained to pull semantically similar passages close together, a real passage is closer to a hypothetical passage (both declarative, both answer-shaped) than it is to a question (interrogative, one-sentence).

The declarative-vs-interrogative asymmetry

Query:  "What package manager does the user prefer?"   ← interrogative
Answer: "The user prefers pnpm over npm."              ← declarative
Stored: "user prefers pnpm"                            ← declarative (short)

Direct query embedding wants to pull a vector close to the first line. HyDE asks the LLM: "what would the answer look like?" and embeds the second line — which is much closer in vector space to the stored fact.

The win is biggest for: - Abstract queries where the question and the answer use entirely different vocabulary. - Multi-hop queries where the answer synthesises facts the question doesn't name. - Short-key stored memories (agent memory stores fact-shaped records, not paragraphs — queries like "do we have a fix for X?" embed poorly against stored "X was fixed by Y on Z").

Canonical wiki instance: Cloudflare Agent Memory

Agent Memory uses HyDE as one of five parallel retrieval channels fused with RRF:

"HyDE vector search finds memories that are similar to what the answer would look like, which often surfaces results that direct embedding misses — particularly for abstract or multi-hop queries where the question and the answer use different vocabulary."

— (Cloudflare, 2026-04-17)

In the Agent Memory pipeline:

  1. Query analyser emits a HyDE statement (declarative answer-shape) alongside ranked topic keys + FTS terms.
  2. HyDE vector search embeds the statement and performs ANN over memory vectors.
  3. Direct vector search runs in parallel on the raw query embedding.
  4. Results from both vector channels + three keyword channels fuse via RRF.

Running HyDE in parallel with direct vector search (not instead of) means no regression on queries where the question embedding already lands on the target — HyDE only adds recall where the question-answer mismatch was a problem.

Trade-offs

Dimension Impact
Latency +1 LLM call per query for the synthesis step (can be parallel with embedding of raw query)
Cost Extra LLM token cost per query
Quality on in-vocab queries Neutral (direct channel still runs)
Quality on out-of-vocab / multi-hop Significant lift
Implementation complexity Another channel in the fusion stage

Most production systems pair HyDE with the raw-query channel rather than replacing it, exactly because HyDE's win is not uniform across query shapes.

Complementary technique: bridge the gap at write time

Agent Memory's write path contains a complementary move: during classification the system generates 3-5 search queries the memory should answer, and prepends them to the memory content before embedding. This makes the stored vector itself more question-shaped (answers the anticipated queries), reducing the write-read asymmetry HyDE was built to fix on the read side.

"The embedding text prepends the 3-5 search queries generated during classification to the memory content itself, bridging the gap between how memories are written (declaratively: 'user prefers dark mode') and how they're searched (interrogatively: 'what theme does the user want?')."

Net: the asymmetry is attacked on both sides — questions turn into hypothetical answers (HyDE at query time), and stored facts are augmented with anticipated questions (at write time).

Seen in

Last updated · 200 distilled / 1,178 read