SYSTEM Cited by 1 source

SPANN¶

SPANN (Space-Partitioned Approximate Nearest Neighbors) is a hybrid tree + graph ANN index from Microsoft Research, specifically designed for larger-than-RAM corpora that require SSD residency. Published at NeurIPS 2021 (arXiv 2111.08566).

Architectural shape¶

Partitioning tree — vectors are partitioned into many posting lists (clusters) organised by a tree index. At query time, the tree is used to identify the small set of posting lists most relevant to the query vector.
Graph on centroids — the centroids themselves form a graph that supports fast nearest-list navigation without exhaustive tree traversal.
SSD residency — posting lists sit on SSD; only the centroid graph + a small working set are kept in RAM. This is the defining architectural property: the corpus does not need to fit in RAM, which decisively breaks with pure graph ANN indices like HNSW.

Why it matters for databases¶

The design target — larger-than-RAM-with-SSD-residency — matches exactly the problem shape a general-purpose relational database encounters as soon as a vector column grows beyond buffer-pool capacity. PlanetScale calls this out verbatim: "SPANN is a hybrid vector indexing and search algorithm that uses both graph and tree structures, and was specifically designed to work well for larger-than-RAM indexes that require SSD usage." (Source: sources/2024-10-22-planetscale-planetscale-vectors-public-beta.)

The InnoDB storage engine already has a page-level SSD-resident substrate (B+tree pages, buffer pool caching, write-ahead log, crash recovery). SPANN's posting-list-on-SSD model composes naturally with that substrate — the posting lists become InnoDB-managed pages.

Limitations of stock SPANN¶

Stock SPANN is not incrementally updatable as published — posting-list reorganisations (split / merge / rebalance) are done offline. Continuous updates are the job of SPFresh, a follow-up paper that extends SPANN with concurrent background maintenance.

Seen in¶

sources/2024-10-22-planetscale-planetscale-vectors-public-beta — PlanetScale's public-beta announcement names SPANN as the algorithmic foundation for its transactional vector index inside InnoDB. The actual implementation is PlanetScale's extension of SPFresh (the SPANN-plus- maintenance follow-up), integrated into InnoDB with transactional semantics added to all SPFresh operations.

systems/spfresh — incrementally-updatable extension.
systems/hnsw — RAM-resident graph ANN; structural alternative rejected for SSD-scale relational use.
systems/diskann — SSD-resident graph ANN; different design point (pure graph, not hybrid).
concepts/spann-index
concepts/ann-index
patterns/hybrid-tree-graph-ann-index
systems/planetscale
systems/innodb

SPANN¶

Architectural shape¶

Why it matters for databases¶

Limitations of stock SPANN¶

Seen in¶

Related¶