Skip to content

CONCEPT Cited by 1 source

Virtual sharding

Definition

Virtual sharding is the pattern of splitting a large search index into multiple independent clusters (each a full primary-replica Lucene deployment), with a coordinator in front that routes writes to the correct shard cluster and scatter-gathers reads across shard clusters. The "virtual" qualifier is that the shard boundary is a deployment-time choice — not an intrinsic property of the on-disk index — so you can grow the shard count by rebuilding; the index itself is just a regular Lucene index per shard.

Why it's needed

A single Lucene index cluster has practical limits:

  • Single-primary write throughput ceiling — all writes funnel through one primary.
  • Memory ceiling for vector indices + hot term dictionaries — even with replicas, each replica needs the full working set in RAM.
  • Historical limitation: single-threaded search per segment — pre-Lucene 10, a single search query could not parallelise across CPU cores within a segment, so even a small number of large segments could bottleneck.

Virtual sharding was a common workaround: split the index into N clusters, and let scatter-gather parallelism across clusters give you what intra-segment parallelism would have given inside one cluster.

Canonical instance: Yelp Nrtsearch

Yelp Nrtsearch uses virtual sharding for large indices with the Nrtsearch Coordinator as the gateway. The [[sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10|2025-05 1.0.0 post]] frames virtual sharding as a workaround for the pre-Lucene-10 lack of intra-single-segment parallel search:

"We are also planning to replace virtual sharding with parallel search in a single segment, a feature added in Lucene 10."

The roadmap statement is revealing: Lucene 10's intra-single-segment parallel search removes one of the main motivations for virtual sharding (CPU parallelism), leaving only data-size-driven sharding as the remaining driver.

Tradeoffs

  • Operational overhead — N clusters is N× the deploy / backup / monitoring cost.
  • Cross-shard aggregations — can be approximate (each shard returns its top-K; global top-K is reconstructed from the union).
  • Rebalancing is expensive — adding a shard usually requires a full reindex into the new shard layout.
  • Scatter-gather tail latency — slowest shard sets request latency.

Seen in

Last updated · 550 distilled / 1,221 read