CONCEPT Cited by 1 source

Larger-than-RAM vector index¶

Definition¶

A larger-than-RAM vector index is an ANN index whose total on-disk footprint exceeds the process's available memory, yet still serves queries at acceptable latency because only a small working set (a centroid-level tree or graph plus a bounded number of hot partitions) needs to be in RAM at any one time.

This is the operational opposite of RAM-resident ANN indices (stock HNSW), which require the whole index — corpus vectors + graph — to fit in memory for query performance to hold.

Why it matters¶

Embedding corpora have grown faster than server RAM. A billion 1,024-dim float32 vectors is 4 TB of raw vector payload, before any index overhead. Keeping production vector indexes RAM-resident at that scale requires fleets of extremely expensive high-memory nodes. Larger-than-RAM indexes move the bulk of the corpus to SSD while keeping query latency workable — the same architectural shift that HDD-era databases made when buffer-pool caching let them serve workloads much larger than DRAM.

The enabling mechanism: tree/graph on centroids + SSD-¶

resident partitions

The canonical algorithmic family that enables this is SPANN-style hybrid indexes:

A centroid tree (or graph) lives in RAM — small, ~20% of total index size in PlanetScale's implementation (Source: ).
Posting lists — the vectors belonging to each partition — live on SSD.
At query time, the tree identifies a bounded number of relevant partitions, which are read from SSD on demand. Hot partitions cache in the page-cache (buffer pool).

The operational ceiling: 6× RAM¶

PlanetScale's 2026-03-25 GA announcement gives the first concrete wiki datum on how far beyond RAM this architecture scales in practice:

"vector indexes … now perform well even when they are 6× larger than available memory."

—

The beta announcement had asserted larger-than-RAM qualitatively ("designed to work well for larger-than-RAM indexes that require SSD usage"); the GA post makes the claim concrete. 6× is not a hard ceiling — it's a working operational claim on where query performance remains acceptable.

Why 6× and not 60×¶

The ceiling isn't arbitrary. As the index grows relative to RAM:

The centroid tree grows proportionally — ~20% of index size in PlanetScale's implementation. At 6× RAM, the tree alone is ~1.2× RAM, already spilling.
The bounded number of partitions loaded per query remains roughly fixed, so query latency stays bounded — until centroid misses start dominating.
SSD round-trip latency ( ~50 μs local NVMe, ~250 μs EBS) becomes the p99 floor.

PlanetScale Metal's direct- attached NVMe is the substrate that "ensures that loading vector partitions from InnoDB to answer queries will be as fast as possible" (Source: ) — the lower SSD round-trip is what makes 6× work at all.

Contrast¶

RAM-resident (HNSW) — best latency when corpus fits; falls off a cliff when it doesn't. Not larger-than-RAM.
Pure SSD graph (DiskANN) — does scale beyond RAM via in-memory product-quantized vectors + SSD graph, but PlanetScale names its incremental updates "not particularly efficient".
Hybrid tree/graph on SSD (SPANN + SPFresh) — the shape that composes with a page-oriented storage engine and sustains OLTP write cadence; the canonical larger-than-RAM vector index family.

Seen in¶

— canonical wiki source. GA announcement asserts "6× larger than available memory" as the working operational ceiling for PlanetScale's SPANN+SPFresh implementation inside InnoDB. First concrete wiki datum.
— beta predecessor; asserts larger-than-RAM qualitatively without a concrete multiple.
sources/2026-04-21-planetscale-larger-than-ram-vector-indexes-for-relational-databases — engineering-deep-dive companion to the beta/GA announcements. Vicent Martí's 2025-10-01 PlanetScale post exposes the quantitative 30% memory ceiling: "by sampling randomly from the original dataset of vectors, which are stored directly in the user's table in InnoDB, we can construct larger-than-RAM indexes requiring only up to 30% of the memory size used by the dataset." The 20% / 80% head-index / posting-list split is confirmed and the head index's role is detailed. The 20% is tuneable: "you can tune that to be smaller at index construction, trading off reduced memory usage for worse recall." Mechanism: the bulk on-disk posting lists sit inside InnoDB via a composite-index LSM emulation so appends don't rewrite blobs; the in-memory head index stays crash-safe via a WAL tied to InnoDB transactions.