CONCEPT Cited by 1 source
SPANN index¶
Definition¶
A SPANN (Space-Partitioned Approximate Nearest Neighbors) index is a hybrid tree + graph ANN index from Microsoft Research (NeurIPS 2021) designed specifically for larger-than-RAM corpora with SSD residency.
See systems/spann for the full system page.
Architectural essence¶
- A partitioning tree organises vectors into posting lists (clusters); the tree identifies which posting lists to probe for a given query.
- A graph over centroids supports fast navigation among the clusters without tree traversal overhead.
- Posting lists live on SSD; only the centroid graph + a small working set need to be in RAM.
This is the hybrid-tree-graph-on-SSD combination captured by patterns/hybrid-tree-graph-ann-index: tree for partitioning and SSD locality, graph for query-quality navigation, SSD for corpus scale.
Why it's the basis for database-hosted vector indexes¶
The design point — larger-than-RAM + SSD-resident + posting -list-structured — composes naturally with a storage engine like InnoDB that already manages page-level SSD data, an in-memory cache ( buffer pool), a write-ahead log, and crash recovery. PlanetScale's 2024-10-22 public-beta announcement makes this explicit: SPANN "was specifically designed to work well for larger-than-RAM indexes that require SSD usage." (Source: sources/2024-10-22-planetscale-planetscale-vectors-public-beta.)
Limitation: not incrementally updatable as published¶
Stock SPANN reorganises posting lists offline. Continuous update is the job of SPFresh — a follow-up from the same research group that extends SPANN with concurrent background maintenance.
Seen in¶
- sources/2024-10-22-planetscale-planetscale-vectors-public-beta — PlanetScale names SPANN as the algorithmic foundation for its transactional vector index inside InnoDB.