CONCEPT Cited by 2 sources
ANN (approximate nearest neighbor) index¶
Definition¶
An ANN (approximate nearest neighbor) index is a data structure + serving system that, given a query vector, returns the top-K closest item vectors from a large pre-indexed corpus — approximately, not exactly, in exchange for sub-linear search cost. It is the serving artifact that makes embedding-based retrieval affordable at production scale.
In a two-tower retrieval / ranking system, the item tower's embeddings are written into an ANN index offline, and at request time the query tower's embedding is used to query the index for the top-K most similar items (by dot product, cosine, or L2 distance).
SilverTorch face — ANN as part of the model itself (2026-05-26)¶
Meta's SilverTorch post (Source: sources/2026-05-26-meta-silvertorch-index-as-model-a-new-retrieval-paradigm-for-recommendation-systems) inverts the serving-artifact framing introduced below. Under the Index as Model paradigm, the ANN index "becomes a tensor inside the model" — there is no longer a separate serving artifact built on a slower cadence than the model. The index is the model.
Verbatim:
"SilverTorch reimplements ANN search as part of the model itself. It stores item embeddings in a compact Int8 format, which cuts memory use roughly in half compared to typical 16 bits, and runs search with a fused GPU kernel. ... The algorithm supports large top-k and probe counts; in practice, we observe no retrieval recall loss with 64 probes and top-2048."
This face produces several departures from the canonical wiki ANN-index framing:
- Top-K capability: "hundreds of thousands" (vs Faiss-GPU's 2,048 ceiling).
- Per-kernel speedup: 2.2–14.7× faster than Faiss-GPU (the fused Int8 ANN primitive).
- Index freshness mechanism: streaming in-place tensor mutation instead of rebuild-and-swap. "With index as a model module, maintaining index freshness equates to updating the model weights of a neural network in production, at scale, without taking the model offline."
- No more rebuild-cadence-vs-model-cadence skew — see "Update cadence + version skew" below for the prior failure mode this resolves.
The patterns/gpu-native-retrieval-primitive-redesign pattern names the design philosophy (redesign the primitive around GPU memory layout + tensor execution, don't port the CPU-era version). SilverTorch supersedes Faiss-GPU inside Meta's recsys retrieval surfaces, not Faiss-the-library across all Meta search/retrieval workloads — systems/meta-groups-scoped-search continues to use Faiss as the production ANN substrate.
Why it's used¶
Exact k-NN over millions to billions of vectors is O(N·D) per query — intractable at request volumes typical for ads ranking, search, or recommendation. ANN indices trade an accepted recall-vs-latency tradeoff (usually ≥95% recall at O(log N) or better) for orders-of-magnitude faster retrieval.
Typical algorithmic families:
- HNSW (Hierarchical Navigable Small World graph) — graph-based; state-of-the-art recall/latency; popular in practice (Lucene, FAISS, Vespa, Qdrant).
- IVF / IVFPQ (inverted file + product quantization) — partitioning + compression; used in FAISS, Milvus.
- Annoy (Spotify's random projection trees) — read-only, simple.
- ScaNN (Google) — learned quantization + pruning.
Role in production recommendation systems¶
An ANN index is the serving artifact for item embeddings in production recommendation / ads / search systems. Candidates flow through it at several points in the funnel:
- Retrieval — generate candidate set from billions of items.
- Early ranking (e.g., Pinterest L1) — narrow further under tight latency before expensive downstream ranking.
- Similar-item / related-item surfaces — direct user-facing applications of k-NN.
The serving-artifact distinction¶
A crucial production-engineering point, central to Pinterest's 2026-02-27 O/O retrospective (sources/2026-02-27-pinterest-bridging-the-gap-online-offline-discrepancy-l1-cvr):
"It's not enough for features to exist in training logs or the Feature Store — they also need to be present in the serving artifacts (like ANN indices) that L1 actually uses to serve traffic."
The ANN index is built from a different feature pipeline than the one the model trained on, and often a different pipeline than the L2 Feature Store that downstream stages consume. A feature that's in training logs + the Feature Store but never onboarded into the ANN-index build path is effectively invisible to any stage that reads from that index — causing silent online-offline discrepancy.
Update cadence + version skew¶
ANN indices are typically rebuilt on a cadence much slower than model-release cadence: hourly snapshots for streaming enrichment, multi-day full rebuilds on large tiers at Pinterest scale. This means:
- The index holds a mix of embedding versions at any moment.
- Query-side embeddings (which run at request time from the live query tower) refresh instantly on model rollout.
- Item-side embeddings (which must propagate through snapshot + rebuild + deploy) lag by hours to days.
This structural cadence mismatch produces embedding version skew, a specific cause of O/O discrepancy in two-tower systems. Pinterest mitigates by favoring batch embedding inference for large tiers so each rebuild uses a single consistent checkpoint.
Design axes¶
- Recall target — how close to exact k-NN the index must come; drives algorithm + parameter choice.
- Latency budget — how much query-time compute is acceptable.
- Build-time budget — how quickly the index can be rebuilt + deployed; caps refresh cadence.
- Memory footprint — HNSW graphs are memory-hungry; PQ-style indices trade memory for recall.
- Update pattern — streaming upserts vs batch rebuilds.
Seen in¶
- sources/2026-02-27-pinterest-bridging-the-gap-online-offline-discrepancy-l1-cvr — Pinterest L1 ranking uses ANN index for Pin embeddings; post documents how features missing from the ANN-index build path (distinct from L2 Feature Store) caused silent training-serving gap; index build + deploy "can span days" on large tiers, driving embedding version skew.¶
— canonical wiki statement of the structural rejection of pure-graph ANN families (HNSW + DiskANN) as the index inside a relational engine, and the structural adoption of the SPANN + SPFresh hybrid tree + graph family instead. Adds three new ANN-index subtypes to the wiki: concepts/hnsw-index, concepts/diskann-index, concepts/spann-index, concepts/spfresh-index; two new architectural shape concepts: concepts/transactional-vector-index + concepts/incremental-vector-index; two new patterns: patterns/hybrid-tree-graph-ann-index + patterns/vector-index-inside-storage-engine.
Related¶
- systems/pinterest-l1-ranking
- systems/spann
- systems/spfresh
- systems/hnsw
- systems/diskann
- concepts/vector-similarity-search
- concepts/two-tower-architecture
- concepts/vector-embedding
- concepts/hybrid-retrieval-bm25-vectors
- concepts/embedding-version-skew
- concepts/online-offline-discrepancy
- concepts/feature-store
- concepts/hnsw-index
- concepts/diskann-index
- concepts/spann-index
- concepts/spfresh-index
- concepts/transactional-vector-index
- concepts/incremental-vector-index
- patterns/hybrid-tree-graph-ann-index
- patterns/vector-index-inside-storage-engine