CONCEPT

DiskANN index¶

Definition¶

A DiskANN index is an SSD-resident graph ANN index based on the Vamana graph-construction algorithm (Microsoft Research, NeurIPS 2019). Unlike HNSW, DiskANN is designed to serve billion-scale corpora that don't fit in RAM — the graph lives on SSD, guided by a small in-memory compressed-vector layer (product quantization).

See systems/diskann for the full system page.

Two structural trade-offs¶

Worse query latency than HNSW. Each query incurs multiple SSD reads; p99 is higher than a RAM-resident HNSW on equivalent recall.
Incremental updates inefficient. The original Vamana design is batch-construct; follow-ups (FreshDiskANN) add incremental updates but efficiency remains a named limitation.

Why these matter for databases¶

PlanetScale's 2024-10-22 vector-beta announcement names DiskANN's limitations as disqualifying for a relational-database-hosted vector index:

"DiskANN scales well, but suffers from worse query performance, and while it can be modified to allow incremental updates, these are not particularly efficient and are hard to map to transactional SQL semantics."

—

Three rejection axes, one of which is new relative to the HNSW-rejection calculus: the difficulty of mapping DiskANN's incremental-update model to transactional SQL semantics. Canonical wiki statement.

When it's the right choice¶

DiskANN dominates when: the corpus is billion-scale and won't fit in RAM; workloads are read-heavy or batch-updated (nightly reindex acceptable); query-latency targets are loose; the vector index is an independent sidecar.

Seen in¶

— canonical wiki statement of DiskANN's three-axis rejection (query latency, incremental-update efficiency, transactional-SQL compatibility) for a database-hosted vector index.