Skip to content

CONCEPT Cited by 1 source

Vector versioning for deletion

Definition

Vector versioning is a technique for making deletion and reassignment append-only in a posting-list-structured ANN index. Each stored vector carries a small version counter; mutations increment the counter rather than rewriting the posting list; a tiny in-memory side table holds the current version per vector. Queries read the posting list and discard any entry whose version doesn't match the in-memory counter, as if it didn't exist.

Mechanism

The data inside each posting list is a tuple

(vector_id, version, vector_data)

where version is a 1-byte counter that keeps increasing until it wraps around (Source: sources/2026-04-21-planetscale-larger-than-ram-vector-indexes-for-relational-databases).

An in-memory versions table stores the current authoritative version for each vector id — in practice tiny because each entry is 1 byte.

Mutation semantics:

  • Insert = append (id, v, data) to the target posting list. Register v in the versions table.
  • Reassignment (move vector from posting list A → posting list B, because its nearest centroid changed after a split) = increment the version counter to v+1, append (id, v+1, data) to posting list B. The stale (id, v, data) in list A is never rewritten. Queries loading A's posting list see the v entry, consult the versions table, see v+1, discard it.
  • Delete = flip a delete flag in the versions table for that vector id. No posting-list mutation at all.

Stale vectors are cleaned up asynchronously by SPFresh background ops — Merge, Defragment, or during a subsequent Split.

Why it matters

Posting-list ANN indexes face a structural asymmetry:

Vector versioning converts removal into an in-memory flag flip, eliminating the write cost on the hot path. The subsequent cleanup is deferred to background ops, which can batch compactions across many posting lists and many versions at once.

This mechanism is what makes SPFresh's Reassign operation cheap — a split that breaks NPA invariants (some vectors now belong to different posting lists) would be prohibitively expensive if each reassignment required rewriting two posting lists. With versioning, each reassignment is append(new) + increment(version).

Trade-offs

  • Wraparound. A 1-byte counter wraps at 256 — a heavily-reassigned vector could cycle through versions and alias a stale entry as fresh. Wraparound-handling protocol not documented in the canonical source.
  • Stale-data bloat. Posting lists accumulate stale versions between background merge/defragment passes. "This is not only wasteful, but it also lowers the recall of the index because during a query, we can end up looking up a posting list which is large as stored in InnoDB but contains very few actually useful vectors." Read-path bloat is the signal that triggers Merge.
  • In-memory versions table required. Deletion is only cheap because the authoritative current version is in RAM. A crash must restore this table — typically from the same WAL that tracks the head-index mutations.

Canonical wiki instance

PlanetScale's SPFresh-based vector index inside InnoDB:

"the data inside the posting lists is actually a tuple of (vector_id, version, vector_data), where version is a 1-byte version counter that keeps increasing until it wraps around. … Any subsequent queries that load the posting list for the old location will see vectors whose version numbers are stale, and these vectors won't be considered when performing distance calculations for the query."

"Deletes are performed by marking a delete flag in the versions table — the same table that is used to mark a vector stale because it's been moved to another posting list. It's really that simple!"

— Vicent Martí, PlanetScale, 2025-10-01.

Seen in

Last updated · 550 distilled / 1,221 read