Skip to content

CONCEPT Cited by 1 source

Dense semantic retrieval

Definition

Dense semantic retrieval (also embedding-based retrieval, EBR) is the class of information-retrieval approaches that encode a query and its candidate documents into dense vectors in a shared embedding space, and retrieve the top candidates via approximate nearest neighbor (ANN) search using a similarity function (cosine similarity, dot product, L2 distance).

The vector is "dense" because every dimension is a learned real-valued feature from the encoder, rather than a term-frequency count. Dimensionality is a model hyperparameter (typically 128–1024) rather than vocabulary size.

Role in modern hybrid retrieval

In hybrid retrieval (cf concepts/hybrid-retrieval-bm25-vectors), dense semantic retrieval is the parallel counterpart to sparse lexical retrieval:

  • Strengths: paraphrase + synonym matching; cross-vocabulary retrieval (user writes "Italian coffee drink", matches post writing "cappuccino"); robust to word-choice variation.
  • Weaknesses: can miss exact-term matches that lexical would nail (proper nouns, specific quotes, acronyms); embedding-drift / out-of-domain failure modes; more expensive per-query (encoder inference + ANN lookup) than BM25 scoring.

Meta Groups Scoped Search instance

The 2026-04-21 Meta Engineering post uses a 12-layer, 200M-parameter encoder called Search Semantic Retriever (SSR) to produce dense query vectors, then runs ANN over a precomputed Faiss index of group posts:

"We then perform an approximate nearest neighbor (ANN) search over a precomputed Faiss vector index of group posts. This enables the retrieval of content based on high-dimensional conceptual similarity, regardless of keyword overlap."

Candidates flow into the L2 MTML ranker with cosine similarity scores as features.

The post calls the ranker config L2 Model + EBR (Hybrid) — "EBR" (embedding-based retrieval) is the production shorthand for dense semantic retrieval inside Meta's ranker feature set.

Adjacent concepts

Seen in

Last updated · 550 distilled / 1,221 read