CONCEPT Cited by 1 source
Virtual sharding¶
Definition¶
Virtual sharding is the pattern of splitting a large search index into multiple independent clusters (each a full primary-replica Lucene deployment), with a coordinator in front that routes writes to the correct shard cluster and scatter-gathers reads across shard clusters. The "virtual" qualifier is that the shard boundary is a deployment-time choice — not an intrinsic property of the on-disk index — so you can grow the shard count by rebuilding; the index itself is just a regular Lucene index per shard.
Why it's needed¶
A single Lucene index cluster has practical limits:
- Single-primary write throughput ceiling — all writes funnel through one primary.
- Memory ceiling for vector indices + hot term dictionaries — even with replicas, each replica needs the full working set in RAM.
- Historical limitation: single-threaded search per segment — pre-Lucene 10, a single search query could not parallelise across CPU cores within a segment, so even a small number of large segments could bottleneck.
Virtual sharding was a common workaround: split the index into N clusters, and let scatter-gather parallelism across clusters give you what intra-segment parallelism would have given inside one cluster.
Canonical instance: Yelp Nrtsearch¶
Yelp Nrtsearch uses virtual sharding for large indices with the Nrtsearch Coordinator as the gateway. The [[sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10|2025-05 1.0.0 post]] frames virtual sharding as a workaround for the pre-Lucene-10 lack of intra-single-segment parallel search:
"We are also planning to replace virtual sharding with parallel search in a single segment, a feature added in Lucene 10."
The roadmap statement is revealing: Lucene 10's intra-single-segment parallel search removes one of the main motivations for virtual sharding (CPU parallelism), leaving only data-size-driven sharding as the remaining driver.
Tradeoffs¶
- Operational overhead — N clusters is N× the deploy / backup / monitoring cost.
- Cross-shard aggregations — can be approximate (each shard returns its top-K; global top-K is reconstructed from the union).
- Rebalancing is expensive — adding a shard usually requires a full reindex into the new shard layout.
- Scatter-gather tail latency — slowest shard sets request latency.
Related¶
- concepts/scatter-gather-query — the read-side primitive.
- concepts/shard-key — the write-side routing primitive.
- systems/nrtsearch
- patterns/coordinator-fronted-sharded-search
Seen in¶
- sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10 — canonical wiki instance. Yelp Nrtsearch uses virtual sharding for large indices fronted by the Coordinator; 1.0.0 roadmap names Lucene 10 intra-single-segment parallel search as the planned in-process replacement.