CONCEPT Cited by 2 sources
ANN (approximate nearest neighbor) index¶
Definition¶
An ANN (approximate nearest neighbor) index is a data structure + serving system that, given a query vector, returns the top-K closest item vectors from a large pre-indexed corpus — approximately, not exactly, in exchange for sub-linear search cost. It is the serving artifact that makes embedding-based retrieval affordable at production scale.
In a two-tower retrieval / ranking system, the item tower's embeddings are written into an ANN index offline, and at request time the query tower's embedding is used to query the index for the top-K most similar items (by dot product, cosine, or L2 distance).
Why it's used¶
Exact k-NN over millions to billions of vectors is O(N·D) per query — intractable at request volumes typical for ads ranking, search, or recommendation. ANN indices trade an accepted recall-vs-latency tradeoff (usually ≥95% recall at O(log N) or better) for orders-of-magnitude faster retrieval.
Typical algorithmic families:
- HNSW (Hierarchical Navigable Small World graph) — graph-based; state-of-the-art recall/latency; popular in practice (Lucene, FAISS, Vespa, Qdrant).
- IVF / IVFPQ (inverted file + product quantization) — partitioning + compression; used in FAISS, Milvus.
- Annoy (Spotify's random projection trees) — read-only, simple.
- ScaNN (Google) — learned quantization + pruning.
Role in production recommendation systems¶
An ANN index is the serving artifact for item embeddings in production recommendation / ads / search systems. Candidates flow through it at several points in the funnel:
- Retrieval — generate candidate set from billions of items.
- Early ranking (e.g., Pinterest L1) — narrow further under tight latency before expensive downstream ranking.
- Similar-item / related-item surfaces — direct user-facing applications of k-NN.
The serving-artifact distinction¶
A crucial production-engineering point, central to Pinterest's 2026-02-27 O/O retrospective (sources/2026-02-27-pinterest-bridging-the-gap-online-offline-discrepancy-l1-cvr):
"It's not enough for features to exist in training logs or the Feature Store — they also need to be present in the serving artifacts (like ANN indices) that L1 actually uses to serve traffic."
The ANN index is built from a different feature pipeline than the one the model trained on, and often a different pipeline than the L2 Feature Store that downstream stages consume. A feature that's in training logs + the Feature Store but never onboarded into the ANN-index build path is effectively invisible to any stage that reads from that index — causing silent online-offline discrepancy.
Update cadence + version skew¶
ANN indices are typically rebuilt on a cadence much slower than model-release cadence: hourly snapshots for streaming enrichment, multi-day full rebuilds on large tiers at Pinterest scale. This means:
- The index holds a mix of embedding versions at any moment.
- Query-side embeddings (which run at request time from the live query tower) refresh instantly on model rollout.
- Item-side embeddings (which must propagate through snapshot + rebuild + deploy) lag by hours to days.
This structural cadence mismatch produces embedding version skew, a specific cause of O/O discrepancy in two-tower systems. Pinterest mitigates by favoring batch embedding inference for large tiers so each rebuild uses a single consistent checkpoint.
Design axes¶
- Recall target — how close to exact k-NN the index must come; drives algorithm + parameter choice.
- Latency budget — how much query-time compute is acceptable.
- Build-time budget — how quickly the index can be rebuilt + deployed; caps refresh cadence.
- Memory footprint — HNSW graphs are memory-hungry; PQ-style indices trade memory for recall.
- Update pattern — streaming upserts vs batch rebuilds.
Seen in¶
- sources/2026-02-27-pinterest-bridging-the-gap-online-offline-discrepancy-l1-cvr — Pinterest L1 ranking uses ANN index for Pin embeddings; post documents how features missing from the ANN-index build path (distinct from L2 Feature Store) caused silent training-serving gap; index build + deploy "can span days" on large tiers, driving embedding version skew.
- sources/2024-10-22-planetscale-planetscale-vectors-public-beta — canonical wiki statement of the structural rejection of pure-graph ANN families (HNSW + DiskANN) as the index inside a relational engine, and the structural adoption of the SPANN + SPFresh hybrid tree + graph family instead. Adds three new ANN-index subtypes to the wiki: concepts/hnsw-index, concepts/diskann-index, concepts/spann-index, concepts/spfresh-index; two new architectural shape concepts: concepts/transactional-vector-index + concepts/incremental-vector-index; two new patterns: patterns/hybrid-tree-graph-ann-index + patterns/vector-index-inside-storage-engine.
Related¶
- systems/pinterest-l1-ranking
- systems/spann
- systems/spfresh
- systems/hnsw
- systems/diskann
- concepts/vector-similarity-search
- concepts/two-tower-architecture
- concepts/vector-embedding
- concepts/hybrid-retrieval-bm25-vectors
- concepts/embedding-version-skew
- concepts/online-offline-discrepancy
- concepts/feature-store
- concepts/hnsw-index
- concepts/diskann-index
- concepts/spann-index
- concepts/spfresh-index
- concepts/transactional-vector-index
- concepts/incremental-vector-index
- patterns/hybrid-tree-graph-ann-index
- patterns/vector-index-inside-storage-engine