SYSTEM Cited by 1 source
SPFresh¶
SPFresh is a follow-up to SPANN from Microsoft Research — published at SOSP 2023 (ACM DL 10.1145/3600006.3613166) — that extends SPANN with concurrent background maintenance operations. These maintenance ops (split, merge, rebalance, centroid reassignment) run continuously in the background, letting the index absorb inserts, updates, and deletes without periodic full rebuilds and without losing recall or query performance.
Why it matters¶
Stock SPANN is a read-mostly index — its posting-list reorganisations are offline. For any application with continuous write load — recommender systems, search indices, OLTP tables with vector columns — a read-mostly index eventually falls out of sync with reality. SPFresh's innovation is making SPANN incrementally updatable with background ops that hold recall and latency constant.
PlanetScale's public-beta announcement describes SPFresh as "extend[ing] the design of SPANN with a set of concurrent background maintenance operations that allow the index to be continuously updated without losing recall or query performance." (Source: sources/2024-10-22-planetscale-planetscale-vectors-public-beta.)
PlanetScale's extension: transactional SPFresh¶
PlanetScale's production implementation goes further: it adds transactional support to all SPFresh operations and integrates SPFresh inside InnoDB (MySQL's default storage engine). Verbatim: "inserts, updates, and deletes of vector data are immediately reflected in the vector index as part of committing your SQL transaction, and follow the same transactional semantics, including support for batch commits and rollbacks."
Architectural consequences:
- The index is a first-class durable InnoDB structure — stored on disk as InnoDB pages, cached in the buffer pool, protected by the InnoDB redo log, rebuilt-free after crashes.
- It is always in-sync with the table because its mutations ride the SQL commit path.
- It survives process crashes with strong consistency guarantees — same guarantees the row data has.
- It does not need periodic rebuilds — the continuous background ops handle reorganisation.
- It scales into terabytes — bounded only by what InnoDB can host.
- It composes with sharding via Vitess, so the full PlanetScale stack supports sharded transactional vector indexes.
This is the canonical wiki instance of patterns/vector-index-inside-storage-engine — move the ANN index inside the storage engine instead of running it as a sidecar.
Trade-offs not yet disclosed¶
PlanetScale's extension adds concurrency between SPFresh's background maintenance ops and user SQL transactions; how these are serialised, how rollbacks of in-flight partial SPFresh updates are handled, and how buffer-pool eviction interacts with SPFresh splits/merges are all deferred in the public-beta announcement.
Seen in¶
- sources/2024-10-22-planetscale-planetscale-vectors-public-beta — canonical wiki source. Names SPFresh as the algorithmic foundation for PlanetScale's transactional vector index and introduces PlanetScale's transactional extension integrated inside InnoDB.
Related¶
- systems/spann — the index SPFresh extends.
- systems/planetscale — consumer; embeds extended SPFresh inside InnoDB with transactional semantics.
- systems/innodb — hosting storage engine.
- concepts/spfresh-index
- concepts/incremental-vector-index
- concepts/transactional-vector-index
- concepts/ann-index
- patterns/vector-index-inside-storage-engine