SYSTEM Cited by 1 source
SPANN¶
SPANN (Space-Partitioned Approximate Nearest Neighbors) is a hybrid tree + graph ANN index from Microsoft Research, specifically designed for larger-than-RAM corpora that require SSD residency. Published at NeurIPS 2021 (arXiv 2111.08566).
Architectural shape¶
- Partitioning tree — vectors are partitioned into many posting lists (clusters) organised by a tree index. At query time, the tree is used to identify the small set of posting lists most relevant to the query vector.
- Graph on centroids — the centroids themselves form a graph that supports fast nearest-list navigation without exhaustive tree traversal.
- SSD residency — posting lists sit on SSD; only the centroid graph + a small working set are kept in RAM. This is the defining architectural property: the corpus does not need to fit in RAM, which decisively breaks with pure graph ANN indices like HNSW.
Why it matters for databases¶
The design target — larger-than-RAM-with-SSD-residency — matches exactly the problem shape a general-purpose relational database encounters as soon as a vector column grows beyond buffer-pool capacity. PlanetScale calls this out verbatim: "SPANN is a hybrid vector indexing and search algorithm that uses both graph and tree structures, and was specifically designed to work well for larger-than-RAM indexes that require SSD usage." (Source: sources/2024-10-22-planetscale-planetscale-vectors-public-beta.)
The InnoDB storage engine already has a page-level SSD-resident substrate (B+tree pages, buffer pool caching, write-ahead log, crash recovery). SPANN's posting-list-on-SSD model composes naturally with that substrate — the posting lists become InnoDB-managed pages.
Limitations of stock SPANN¶
Stock SPANN is not incrementally updatable as published — posting-list reorganisations (split / merge / rebalance) are done offline. Continuous updates are the job of SPFresh, a follow-up paper that extends SPANN with concurrent background maintenance.
Seen in¶
- sources/2024-10-22-planetscale-planetscale-vectors-public-beta — PlanetScale's public-beta announcement names SPANN as the algorithmic foundation for its transactional vector index inside InnoDB. The actual implementation is PlanetScale's extension of SPFresh (the SPANN-plus- maintenance follow-up), integrated into InnoDB with transactional semantics added to all SPFresh operations.
Related¶
- systems/spfresh — incrementally-updatable extension.
- systems/hnsw — RAM-resident graph ANN; structural alternative rejected for SSD-scale relational use.
- systems/diskann — SSD-resident graph ANN; different design point (pure graph, not hybrid).
- concepts/spann-index
- concepts/ann-index
- patterns/hybrid-tree-graph-ann-index
- systems/planetscale
- systems/innodb