Skip to content

SYSTEM Cited by 1 source

SPANN

SPANN (Space-Partitioned Approximate Nearest Neighbors) is a hybrid tree + graph ANN index from Microsoft Research, specifically designed for larger-than-RAM corpora that require SSD residency. Published at NeurIPS 2021 (arXiv 2111.08566).

Architectural shape

  • Partitioning tree — vectors are partitioned into many posting lists (clusters) organised by a tree index. At query time, the tree is used to identify the small set of posting lists most relevant to the query vector.
  • Graph on centroids — the centroids themselves form a graph that supports fast nearest-list navigation without exhaustive tree traversal.
  • SSD residency — posting lists sit on SSD; only the centroid graph + a small working set are kept in RAM. This is the defining architectural property: the corpus does not need to fit in RAM, which decisively breaks with pure graph ANN indices like HNSW.

Why it matters for databases

The design target — larger-than-RAM-with-SSD-residency — matches exactly the problem shape a general-purpose relational database encounters as soon as a vector column grows beyond buffer-pool capacity. PlanetScale calls this out verbatim: "SPANN is a hybrid vector indexing and search algorithm that uses both graph and tree structures, and was specifically designed to work well for larger-than-RAM indexes that require SSD usage." (Source: sources/2024-10-22-planetscale-planetscale-vectors-public-beta.)

The InnoDB storage engine already has a page-level SSD-resident substrate (B+tree pages, buffer pool caching, write-ahead log, crash recovery). SPANN's posting-list-on-SSD model composes naturally with that substrate — the posting lists become InnoDB-managed pages.

Limitations of stock SPANN

Stock SPANN is not incrementally updatable as published — posting-list reorganisations (split / merge / rebalance) are done offline. Continuous updates are the job of SPFresh, a follow-up paper that extends SPANN with concurrent background maintenance.

Seen in

  • sources/2024-10-22-planetscale-planetscale-vectors-public-beta — PlanetScale's public-beta announcement names SPANN as the algorithmic foundation for its transactional vector index inside InnoDB. The actual implementation is PlanetScale's extension of SPFresh (the SPANN-plus- maintenance follow-up), integrated into InnoDB with transactional semantics added to all SPFresh operations.
Last updated · 319 distilled / 1,201 read