CONCEPT Cited by 1 source
Dense semantic retrieval¶
Definition¶
Dense semantic retrieval (also embedding-based retrieval, EBR) is the class of information-retrieval approaches that encode a query and its candidate documents into dense vectors in a shared embedding space, and retrieve the top candidates via approximate nearest neighbor (ANN) search using a similarity function (cosine similarity, dot product, L2 distance).
The vector is "dense" because every dimension is a learned real-valued feature from the encoder, rather than a term-frequency count. Dimensionality is a model hyperparameter (typically 128–1024) rather than vocabulary size.
Role in modern hybrid retrieval¶
In hybrid retrieval (cf concepts/hybrid-retrieval-bm25-vectors), dense semantic retrieval is the parallel counterpart to sparse lexical retrieval:
- Strengths: paraphrase + synonym matching; cross-vocabulary retrieval (user writes "Italian coffee drink", matches post writing "cappuccino"); robust to word-choice variation.
- Weaknesses: can miss exact-term matches that lexical would nail (proper nouns, specific quotes, acronyms); embedding-drift / out-of-domain failure modes; more expensive per-query (encoder inference + ANN lookup) than BM25 scoring.
Meta Groups Scoped Search instance¶
The 2026-04-21 Meta Engineering post uses a 12-layer, 200M-parameter encoder called Search Semantic Retriever (SSR) to produce dense query vectors, then runs ANN over a precomputed Faiss index of group posts:
"We then perform an approximate nearest neighbor (ANN) search over a precomputed Faiss vector index of group posts. This enables the retrieval of content based on high-dimensional conceptual similarity, regardless of keyword overlap."
Candidates flow into the L2 MTML ranker with cosine similarity scores as features.
The post calls the ranker config L2 Model + EBR (Hybrid) — "EBR" (embedding-based retrieval) is the production shorthand for dense semantic retrieval inside Meta's ranker feature set.
Adjacent concepts¶
- concepts/hybrid-retrieval-bm25-vectors — the paired pattern.
- concepts/query-preprocessing-tokenization-normalization — the shared upstream stage.
- concepts/vector-embedding · concepts/ann-index · concepts/vector-similarity-search — the primitives.