Skip to content

CONCEPT Cited by 1 source

SPFresh index

Definition

An SPFresh index is a SPANN ANN index extended with concurrent background maintenance operations — split, merge, rebalance, centroid reassignment — that run continuously in the background to absorb inserts, updates, and deletes without periodic full rebuilds and without degrading recall or query latency.

SPFresh is Microsoft Research follow-up to SPANN, published at SOSP 2023 (10.1145/3600006.3613166). See systems/spfresh for the full system page.

Why it matters

The defining property of SPFresh is that it makes a larger-than-RAM SSD-resident vector index incrementally updatable — the missing piece from stock SPANN, which reorganises posting lists offline. SPFresh is therefore the closest academic precedent for a vector index that can sustain OLTP-shaped write load.

PlanetScale's extension: transactional SPFresh

PlanetScale's 2024-10-22 vector-beta announcement describes a transactional extension of SPFresh integrated inside InnoDB:

"For our implementation, we have extended SPFresh by adding transactional support to all its operations and fully integrating it inside InnoDB, MySQL's default storage engine. This means that inserts, updates, and deletes of vector data are immediately reflected in the vector index as part of committing your SQL transaction, and follow the same transactional semantics, including support for batch commits and rollbacks."

sources/2024-10-22-planetscale-planetscale-vectors-public-beta

Canonical wiki instance of concepts/transactional-vector-index — a vector index whose mutations obey the hosting database's transactional semantics.

Open questions

How SPFresh's background maintenance ops are serialised against user SQL transactions, how partial in-flight ops interact with rollback, and how buffer-pool eviction composes with SPFresh splits/merges are all deferred in PlanetScale's public-beta announcement.

Seen in

Last updated · 319 distilled / 1,201 read