Skip to content

CONCEPT Cited by 1 source

SPANN index

Definition

A SPANN (Space-Partitioned Approximate Nearest Neighbors) index is a hybrid tree + graph ANN index from Microsoft Research (NeurIPS 2021) designed specifically for larger-than-RAM corpora with SSD residency.

See systems/spann for the full system page.

Architectural essence

  • A partitioning tree organises vectors into posting lists (clusters); the tree identifies which posting lists to probe for a given query.
  • A graph over centroids supports fast navigation among the clusters without tree traversal overhead.
  • Posting lists live on SSD; only the centroid graph + a small working set need to be in RAM.

This is the hybrid-tree-graph-on-SSD combination captured by patterns/hybrid-tree-graph-ann-index: tree for partitioning and SSD locality, graph for query-quality navigation, SSD for corpus scale.

Why it's the basis for database-hosted vector indexes

The design point — larger-than-RAM + SSD-resident + posting -list-structured — composes naturally with a storage engine like InnoDB that already manages page-level SSD data, an in-memory cache ( buffer pool), a write-ahead log, and crash recovery. PlanetScale's 2024-10-22 public-beta announcement makes this explicit: SPANN "was specifically designed to work well for larger-than-RAM indexes that require SSD usage." (Source: sources/2024-10-22-planetscale-planetscale-vectors-public-beta.)

Limitation: not incrementally updatable as published

Stock SPANN reorganises posting lists offline. Continuous update is the job of SPFresh — a follow-up from the same research group that extends SPANN with concurrent background maintenance.

Seen in

Last updated · 319 distilled / 1,201 read