SYSTEM Cited by 1 source
SPFresh¶
SPFresh is a follow-up to SPANN from Microsoft Research — published at SOSP 2023 (ACM DL 10.1145/3600006.3613166) — that extends SPANN with concurrent background maintenance operations. These maintenance ops (split, merge, rebalance, centroid reassignment) run continuously in the background, letting the index absorb inserts, updates, and deletes without periodic full rebuilds and without losing recall or query performance.
Why it matters¶
Stock SPANN is a read-mostly index — its posting-list reorganisations are offline. For any application with continuous write load — recommender systems, search indices, OLTP tables with vector columns — a read-mostly index eventually falls out of sync with reality. SPFresh's innovation is making SPANN incrementally updatable with background ops that hold recall and latency constant.
PlanetScale's public-beta announcement describes SPFresh as "extend[ing] the design of SPANN with a set of concurrent background maintenance operations that allow the index to be continuously updated without losing recall or query performance." (Source: .)
PlanetScale's extension: transactional SPFresh¶
PlanetScale's production implementation goes further: it adds transactional support to all SPFresh operations and integrates SPFresh inside InnoDB (MySQL's default storage engine). Verbatim: "inserts, updates, and deletes of vector data are immediately reflected in the vector index as part of committing your SQL transaction, and follow the same transactional semantics, including support for batch commits and rollbacks."
Architectural consequences:
- The index is a first-class durable InnoDB structure — stored on disk as InnoDB pages, cached in the buffer pool, protected by the InnoDB redo log, rebuilt-free after crashes.
- It is always in-sync with the table because its mutations ride the SQL commit path.
- It survives process crashes with strong consistency guarantees — same guarantees the row data has.
- It does not need periodic rebuilds — the continuous background ops handle reorganisation.
- It scales into terabytes — bounded only by what InnoDB can host.
- It composes with sharding via Vitess, so the full PlanetScale stack supports sharded transactional vector indexes.
This is the canonical wiki instance of patterns/vector-index-inside-storage-engine — move the ANN index inside the storage engine instead of running it as a sidecar.
Trade-offs not yet disclosed¶
PlanetScale's extension adds concurrency between SPFresh's background maintenance ops and user SQL transactions; how these are serialised, how rollbacks of in-flight partial SPFresh updates are handled, and how buffer-pool eviction interacts with SPFresh splits/merges are all deferred in the public-beta announcement.
Seen in¶
-
— canonical wiki source. Names SPFresh as the algorithmic foundation for PlanetScale's transactional vector index and introduces PlanetScale's transactional extension integrated inside InnoDB.
-
— GA announcement. Confirms the SPFresh-in-InnoDB architecture ships in production. Operational disclosures added at GA: 2× query performance + 8× memory efficiency improvements since beta, 6× larger-than- RAM working ceiling (concepts/larger-than-ram-vector-index), fixed + product quantization down to 1 bit/field, Euclidean / inner product / cosine distance metrics, 16,383-dimension ceiling. SPFresh posting lists disclosed as "hidden InnoDB tables" — concrete integration mechanism. SPFresh posting-list I/O named as the specific workload PlanetScale Metal optimises.
-
sources/2026-04-21-planetscale-larger-than-ram-vector-indexes-for-relational-databases — engineering deep-dive (Vicent Martí, 2025-10-01). Canonicalises the SPFresh LIRE protocol four background ops as PlanetScale implements them:
-
Split — K-way (PlanetScale extends LIRE's 2-way for high-insert-load robustness).
- Reassign — moves vectors whose nearest centroid changed after a split, without rewriting posting lists: implemented via [[concepts/vector-versioning-for-deletion|1-byte version counter per vector]] + tiny in-memory versions table.
- Merge — consolidates small / stale posting lists, recomputes centroid, rewrites as one compact list.
- Defragment — PlanetScale-added op that compacts underlying B-tree rows under heavy load without merging postings or removing stale vectors.
Three non-LIRE novelties PlanetScale contributes on top of the SPFresh paper: (a) LSM-emulation via composite index on B-tree to get append-cheap posting-list writes inside InnoDB, documented at patterns/lsm-emulation-on-btree-via-composite-index; (b) transactional semantics for all SPFresh ops tied to InnoDB transactions; (c) an HNSW head-index crash-recovery scheme using a WAL committed alongside InnoDB's redo log with pausable compaction.
Related¶
- systems/spann — the index SPFresh extends.
- systems/planetscale — consumer; embeds extended SPFresh inside InnoDB with transactional semantics.
- systems/innodb — hosting storage engine.
- concepts/spfresh-index
- concepts/incremental-vector-index
- concepts/transactional-vector-index
- concepts/ann-index
- concepts/vector-versioning-for-deletion
- patterns/vector-index-inside-storage-engine
- patterns/lsm-emulation-on-btree-via-composite-index
- patterns/wal-tied-in-memory-index-mutation