Skip to content

PATTERN Cited by 1 source

WAL-tied in-memory index mutation

Problem

Hybrid ANN indexes keep a small in-memory structure (e.g. an HNSW graph over centroids — the head index in SPFresh) for fast query navigation, while the bulk of vector data lives on disk in posting lists. The in-memory structure must stay in sync with on-disk mutations across crashes:

  • If background maintenance splits a posting list and inserts two new centroids into the head index, a crash before the head index is serialised must replay those centroid inserts on recovery.
  • If the durable posting lists advance but the in-memory head index lags, queries after recovery can route to the wrong posting list and miss results — the index loses recall without any indication.

Naively serialising the head index on every mutation is unworkable — the head index is continuously modified by background ops (Split / Merge / Reassign), and re-serialising after each one would dominate the write cost.

Solution

Tie in-memory index mutations to a write-ahead log committed with the same transaction as the on-disk mutations.

Concretely:

  • Background maintenance jobs run inside host-engine transactions.
  • Changes to the in-memory structure (centroid inserts / deletes on the head index) are recorded in a WAL, not applied directly.
  • The WAL is committed together with the transaction's on-disk changes (posting-list mutations in InnoDB pages).
  • On recovery: load the last-serialised snapshot of the in-memory structure from disk; replay the WAL on top of it to reach the post-crash state.

Because the WAL commit is atomic with the on-disk commit, the in-memory structure and on-disk data cannot diverge across crashes.

Periodic snapshot + WAL compaction

The WAL grows unbounded without compaction. The snapshot mechanism:

  • Periodically serialise the current in-memory structure to an on-disk blob (the new snapshot).
  • Once the snapshot is verified durable, truncate the WAL.

The subtlety is concurrency: what if background ops append to the WAL while the snapshot is being serialised? PlanetScale's answer is a whole-system pause of background jobs during compaction — tolerable because user-facing traffic doesn't modify the in-memory index.

Canonical wiki instance

PlanetScale's SPFresh-inside-InnoDB vector index uses this pattern verbatim:

"To keep the memory index in an always-consistent state, we use a Write Ahead Log tied to the InnoDB transaction for each job. Changes to the HNSW index are stored in a WAL that will eventually be committed together with the changes to the posting lists. This allows us to keep our Head index with a very efficient in-memory representation that is always in sync with the posting data on disk. If MySQL crashes at any point, during the recovery process we load the last serialized form of the HNSW index (stored in an on-disk blob) and re-apply all the changes from the InnoDB WAL."

— Vicent Martí, PlanetScale, 2025-10-01. (Source: sources/2026-04-21-planetscale-larger-than-ram-vector-indexes-for-relational-databases)

On compaction:

"WAL compaction can be performed by pausing all background jobs in the system while the compaction is running. This allows all user-facing operations to continue without contention while the head index is being serialized, and any background jobs triggered during the compaction are just paused and queued until they're ready to run."

The critical enabling property: user-facing SELECT / INSERT / UPDATE / DELETE never mutate the head index. Only background maintenance jobs do. This is what makes the pause trivial — user latency is not affected.

Why the read-only-on-hot-path property matters

The pattern composes only when user traffic can be served against a consistent read-only view of the in-memory structure. If user writes mutated the head index directly, WAL compaction would have to coordinate with every write path — a much harder concurrency problem.

In SPFresh, the user-level Insert appends to a posting list and does not modify the head index; the head index's random-sample centroids are promoted / demoted only by background Split / Merge / Reassign ops. This architectural property is what makes WAL compaction pausable and simple.

Trade-offs

  • Compaction pause duration is not user-visible but does halt background maintenance — if splits / merges queue up faster than post-pause execution drains them, the index can degrade. Pause frequency vs duration needs tuning.
  • Snapshot size vs WAL growth is a classic space-time tradeoff. Larger snapshots → smaller WAL → more recovery bandwidth per replay vs less frequent snapshotting.
  • The in-memory structure must fit in RAM — this pattern is only applicable when the in-memory component is small relative to total dataset size (head index is ~20% in PlanetScale's case). A fully in-memory HNSW cannot use this pattern for the same reason — its entire graph is the "in-memory structure" and the snapshot cost isn't bounded.
  • WAL commit coupling to host-engine transaction is implementation-specific. PlanetScale piggybacks on InnoDB's redo log via its transaction manager; a different engine might need a separate log with its own durability guarantees.
  • Pure in-memory HNSW + periodic snapshots — no WAL; snapshots are the only crash-recovery mechanism. Bounded data loss between snapshots. Simpler but not transactional.
  • Fully durable B-tree vector index — everything on disk, no in-memory structure. No sync problem, but no performance either (DiskANN-class).
  • patterns/vector-index-inside-storage-engine — the meta-pattern; WAL-tied head-index mutation is the crash-safe mechanism that makes a hybrid variant of this pattern work without dual-log complexity.

Seen in

Last updated · 550 distilled / 1,221 read