SYSTEM Cited by 1 source
HNSW¶
HNSW (Hierarchical Navigable Small World graph) is a graph-based ANN index and the dominant in-memory vector index in practice — used by Lucene / Elasticsearch / OpenSearch, FAISS, Vespa, Qdrant, and most of the vector DBs that need high-quality top-K with tight tail latency.
Original paper: Malkov & Yashunin, 2016 (arXiv 1603.09320).
Architectural shape¶
- Multi-layer proximity graph — each vector is a node; edges connect it to approximate neighbours at multiple layers, with top layers sparser.
- Greedy descent — queries enter the top layer, greedily walk to the nearest node, descend a layer, repeat until the bottom layer returns the top-K candidates.
- RAM-resident — the graph must fit in memory for good performance; random-access walk across edges is the inner loop.
Strengths¶
- State-of-the-art recall vs. latency for in-memory corpora.
- Simple to implement relative to tree + quantization alternatives.
- Widely integrated — most open-source vector tooling ships an HNSW implementation.
Structural limitations named by PlanetScale¶
PlanetScale's 2024-10-22 vector-beta announcement names two structural HNSW limitations that make it "not a good fit for a relational database":
-
RAM-bound. "HNSW has very good query performance, but struggles to scale because it needs to fit its whole dataset in RAM." Canonical pain point when the vector column grows beyond buffer-pool capacity.
-
No incremental update. "HNSW indexes cannot be updated incrementally, so they require periodically re-building the index with the underlying vector data. This is just not a good fit for a relational database." Periodic full rebuilds are orthogonal to OLTP write cadence.
(Source: sources/2024-10-22-planetscale-planetscale-vectors-public-beta.)
Why PlanetScale chose SPANN/SPFresh instead¶
The two HNSW limitations map to two requirements for a database-hosted vector index — larger-than-RAM scale and continuous incremental updates — and the combination disqualifies HNSW. SPANN (hybrid tree + graph, SSD-resident) addresses scale; SPFresh (concurrent background maintenance on top of SPANN) addresses continuous updates. Together they give the shape PlanetScale needed.
When HNSW is the right choice¶
HNSW remains the dominant choice when:
- The corpus fits comfortably in RAM.
- Writes are bulk / episodic (nightly reindex acceptable).
- Query latency at high recall is the dominant performance target.
- You don't need the vector index to share transactional semantics with relational data.
The wiki has numerous HNSW instances implicit in ANN-index usage on top of Elasticsearch / OpenSearch / MongoDB Atlas vector search, and explicitly as the default graph-ANN family under concepts/ann-index.
Seen in¶
- sources/2024-10-22-planetscale-planetscale-vectors-public-beta — named as a rejected alternative on two structural axes (RAM-bound, no incremental update) for PlanetScale's relational-database vector index.
Related¶
- systems/diskann — SSD-resident graph ANN; alternative when HNSW's RAM bound is the blocker.
- systems/spann — hybrid tree + graph, SSD-resident.
- systems/spfresh — incrementally-updatable SPANN.
- concepts/hnsw-index
- concepts/ann-index
- concepts/vector-similarity-search