CONCEPT Cited by 1 source

Virtual sharding¶

Definition¶

Virtual sharding is the pattern of splitting a large search index into multiple independent clusters (each a full primary-replica Lucene deployment), with a coordinator in front that routes writes to the correct shard cluster and scatter-gathers reads across shard clusters. The "virtual" qualifier is that the shard boundary is a deployment-time choice — not an intrinsic property of the on-disk index — so you can grow the shard count by rebuilding; the index itself is just a regular Lucene index per shard.

Why it's needed¶

A single Lucene index cluster has practical limits:

Single-primary write throughput ceiling — all writes funnel through one primary.
Memory ceiling for vector indices + hot term dictionaries — even with replicas, each replica needs the full working set in RAM.
Historical limitation: single-threaded search per segment — pre-Lucene 10, a single search query could not parallelise across CPU cores within a segment, so even a small number of large segments could bottleneck.

Virtual sharding was a common workaround: split the index into N clusters, and let scatter-gather parallelism across clusters give you what intra-segment parallelism would have given inside one cluster.

Canonical instance: Yelp Nrtsearch¶

Yelp Nrtsearch uses virtual sharding for large indices with the Nrtsearch Coordinator as the gateway. The [[sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10|2025-05 1.0.0 post]] frames virtual sharding as a workaround for the pre-Lucene-10 lack of intra-single-segment parallel search:

"We are also planning to replace virtual sharding with parallel search in a single segment, a feature added in Lucene 10."

The roadmap statement is revealing: Lucene 10's intra-single-segment parallel search removes one of the main motivations for virtual sharding (CPU parallelism), leaving only data-size-driven sharding as the remaining driver.

Tradeoffs¶

Operational overhead — N clusters is N× the deploy / backup / monitoring cost.
Cross-shard aggregations — can be approximate (each shard returns its top-K; global top-K is reconstructed from the union).
Rebalancing is expensive — adding a shard usually requires a full reindex into the new shard layout.
Scatter-gather tail latency — slowest shard sets request latency.

concepts/scatter-gather-query — the read-side primitive.
concepts/shard-key — the write-side routing primitive.
systems/nrtsearch
patterns/coordinator-fronted-sharded-search

Seen in¶

sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10 — canonical wiki instance. Yelp Nrtsearch uses virtual sharding for large indices fronted by the Coordinator; 1.0.0 roadmap names Lucene 10 intra-single-segment parallel search as the planned in-process replacement.