Skip to content

CONCEPT Cited by 2 sources

Larger-than-RAM vector index

Definition

A larger-than-RAM vector index is an ANN index whose total on-disk footprint exceeds the process's available memory, yet still serves queries at acceptable latency because only a small working set (a centroid-level tree or graph plus a bounded number of hot partitions) needs to be in RAM at any one time.

This is the operational opposite of RAM-resident ANN indices (stock HNSW), which require the whole index — corpus vectors + graph — to fit in memory for query performance to hold.

Why it matters

Embedding corpora have grown faster than server RAM. A billion 1,024-dim float32 vectors is 4 TB of raw vector payload, before any index overhead. Keeping production vector indexes RAM-resident at that scale requires fleets of extremely expensive high-memory nodes. Larger-than-RAM indexes move the bulk of the corpus to SSD while keeping query latency workable — the same architectural shift that HDD-era databases made when buffer-pool caching let them serve workloads much larger than DRAM.

The enabling mechanism: tree/graph on centroids + SSD-

resident partitions

The canonical algorithmic family that enables this is SPANN-style hybrid indexes:

  • A centroid tree (or graph) lives in RAM — small, ~20% of total index size in PlanetScale's implementation (Source: sources/2026-04-21-planetscale-planetscale-vectors-is-now-ga).
  • Posting lists — the vectors belonging to each partition — live on SSD.
  • At query time, the tree identifies a bounded number of relevant partitions, which are read from SSD on demand. Hot partitions cache in the page-cache (buffer pool).

The operational ceiling: 6× RAM

PlanetScale's 2026-03-25 GA announcement gives the first concrete wiki datum on how far beyond RAM this architecture scales in practice:

"vector indexes … now perform well even when they are 6× larger than available memory."

sources/2026-04-21-planetscale-planetscale-vectors-is-now-ga

The beta announcement had asserted larger-than-RAM qualitatively ("designed to work well for larger-than-RAM indexes that require SSD usage"); the GA post makes the claim concrete. 6× is not a hard ceiling — it's a working operational claim on where query performance remains acceptable.

Why 6× and not 60×

The ceiling isn't arbitrary. As the index grows relative to RAM:

  • The centroid tree grows proportionally — ~20% of index size in PlanetScale's implementation. At 6× RAM, the tree alone is ~1.2× RAM, already spilling.
  • The bounded number of partitions loaded per query remains roughly fixed, so query latency stays bounded — until centroid misses start dominating.
  • SSD round-trip latency ( ~50 μs local NVMe, ~250 μs EBS) becomes the p99 floor.

PlanetScale Metal's direct- attached NVMe is the substrate that "ensures that loading vector partitions from InnoDB to answer queries will be as fast as possible" (Source: sources/2026-04-21-planetscale-planetscale-vectors-is-now-ga) — the lower SSD round-trip is what makes 6× work at all.

Contrast

  • RAM-resident (HNSW) — best latency when corpus fits; falls off a cliff when it doesn't. Not larger-than-RAM.
  • Pure SSD graph (DiskANN) — does scale beyond RAM via in-memory product-quantized vectors + SSD graph, but PlanetScale names its incremental updates "not particularly efficient".
  • Hybrid tree/graph on SSD (SPANN + SPFresh) — the shape that composes with a page-oriented storage engine and sustains OLTP write cadence; the canonical larger-than-RAM vector index family.

Seen in

Last updated · 470 distilled / 1,213 read