PATTERN Cited by 1 source
Coordinator-fronted sharded search¶
Intent¶
For large search indices that don't fit in a single primary-replica cluster, run a coordinator service in front of multiple per-shard clusters. The coordinator owns both the ingest-routing side (route index / commit / delete requests to the right shard's primary) and the query fan-out side (scatter-gather a search request across shard clusters and merge the responses). Clients talk to one endpoint and never see the sharding.
Shape¶
indexing clients search clients
│ │
│ │
│ ┌─────────────────────────┐ │
│ │ Coordinator │ │
├─────────►│ │◄────────┤
│ - route write: shard_k │
│ - scatter-gather read │
│ - merge results │
└─┬─────────┬────────┬────┘
│ │ │
▼ ▼ ▼
Shard-1 Shard-2 Shard-N
(primary +(primary +(primary +
replicas) replicas) replicas)
The coordinator does two fundamentally different jobs:
- Write-side routing — deterministic pick of which shard handles a given document ID (hash-based, range-based, or per-tenant). Must match the index layout so writes land on the same shard every time.
- Read-side fan-out — broadcast the query to all shards, gather top-K from each, merge into final top-K. Correctness requires per-shard scoring to be comparable (usually: same analyzer, same scoring function, same query).
Canonical instance: Yelp Nrtsearch¶
Yelp Nrtsearch puts a Nrtsearch Coordinator in front of per-shard clusters for large indices. Per the 2025-05 1.0.0 post:
"Large Nrtsearch indices are split into multiple clusters, and Nrtsearch coordinator directs all requests to the correct Nrtsearch primaries and replicas. It also does scatter-gather for search requests if needed. On the ingestion side, Nrtsearch coordinator receives index, commit, and delete requests from indexing clients, and forwards the requests to the correct Nrtsearch primaries."
Yelp frames this as their virtual sharding mechanism — the shard count is a deployment choice, not a property of the Lucene index itself. The 2023 "Coordinator — The Gateway for Nrtsearch" post on the Yelp engineering blog has the canonical architectural exposition (not yet ingested as of this turn).
The 1.0.0 post also names Lucene 10's intra-single-segment parallel search as a future replacement for virtual sharding — meaning: for indices that fit in a single Lucene index, the scatter-gather-across-clusters shape can be replaced by parallel-search-within-one-cluster, which is simpler operationally.
When to use¶
- Index size exceeds the memory / disk capacity of a single primary-replica cluster.
- Ingestion throughput exceeds a single primary's write rate.
- Per-shard scoring is comparable (same analyzer, same scoring formula).
Pitfalls¶
- Cross-shard aggregations are lossy. Scatter-gather works for top-K retrieval; term / bucket aggregations need careful merge logic and may be approximate (each shard returns its top-K terms; global top-K recomputed from the union may miss terms that were common in aggregate but never top-K locally).
- Tail latency is set by the slowest shard. A single slow or unhealthy shard sets request latency. Hedging, partial-results-on-timeout, and shard-health circuit breakers are usual mitigations.
- Operational overhead of per-shard clusters. Each shard is a full Nrtsearch cluster with its own primary, replicas, and backup config. For N-shard deployments this is N times the cluster ops.
Adjacent patterns¶
- concepts/scatter-gather-query — the query-side primitive.
- systems/elasticsearch — Elasticsearch does the equivalent natively (shards are Lucene indices, coordinator is a cluster node role); Nrtsearch extracts the coordinator out so each shard is its own full Nrtsearch cluster.
- patterns/split-cluster-by-market-for-load-isolation — Zalando's tenant-based cluster split, a conceptual sibling (isolation rather than scale).
- patterns/shard-parallel-backup-and-restore — how shard-level operations compose for bulk admin tasks.
Seen in¶
- sources/2025-05-08-yelp-nrtsearch-100-incremental-backups-lucene-10 — canonical wiki instance. Yelp's Nrtsearch Coordinator fronts per-shard clusters for large indices, routing writes and scatter-gathering reads. Named as virtual sharding; future roadmap replaces it with Lucene 10 intra-single- segment parallel search where single-cluster capacity suffices.