SYSTEM Cited by 1 source
DiskANN¶
DiskANN is an SSD-resident graph ANN index from Microsoft Research, based on the Vamana graph-construction algorithm. Published at NeurIPS 2019 (Subramanya et al., paper).
Architectural shape¶
- Single-layer proximity graph (Vamana) — similar to HNSW in spirit but constructed by an algorithm that tolerates greedy traversal with good recall even with fewer edges per node.
- SSD residency — the graph is stored on SSD; a small in-memory compressed layer (via product quantization) guides the traversal so that only a bounded number of SSD reads are needed per query.
- Tight SSD-read budget per query — the index is designed to keep p99 query latency manageable despite reading from disk.
Relative to HNSW¶
- DiskANN scales beyond RAM by design — this is the key structural difference from HNSW. For billion-scale corpora that don't fit in memory, DiskANN is a canonical choice.
- DiskANN has worse query latency than HNSW on RAM-resident workloads — each query typically incurs multiple SSD reads.
- DiskANN is harder to update incrementally — the original design is batch-construct; FreshDiskANN and other follow-ups add incremental-update support but efficiency is a named limitation.
Structural limitations named by PlanetScale¶
PlanetScale's 2024-10-22 vector-beta announcement names two DiskANN limitations that disqualify it from being the index inside a relational engine:
- Worse query performance. "DiskANN scales well, but suffers from worse query performance…"
- Incremental updates inefficient + hard to map to transactional SQL. "…and while it can be modified to allow incremental updates, these are not particularly efficient and are hard to map to transactional SQL semantics."
(Source: sources/2024-10-22-planetscale-planetscale-vectors-public-beta.)
Why PlanetScale chose SPANN/SPFresh instead¶
PlanetScale rejected DiskANN on the same two axes that reject HNSW — incremental-update quality and transactional-SQL-semantics compatibility — with the added axis of query latency. SPANN's hybrid tree + graph design (rather than DiskANN's pure graph) provides better query-time pruning via the tree structure, and SPFresh's concurrent background maintenance is specifically designed for continuous update compatibility.
When DiskANN is the right choice¶
DiskANN remains the dominant choice when:
- The corpus is billion-scale and definitely won't fit in RAM.
- The workload is read-heavy / batch-updated (nightly reindex acceptable).
- Query latency is secondary to corpus scale.
- The vector index is an independent sidecar (no need to share transactional semantics with relational data).
Seen in¶
- sources/2024-10-22-planetscale-planetscale-vectors-public-beta — named as a rejected alternative on three structural axes (query latency, incremental-update efficiency, transactional-SQL compatibility) for PlanetScale's relational-database vector index.
Related¶
- systems/hnsw — RAM-resident graph ANN sibling.
- systems/spann — hybrid tree + graph, SSD-resident; different design point.
- systems/spfresh — incrementally-updatable SPANN.
- concepts/diskann-index
- concepts/ann-index
- concepts/vector-similarity-search