Skip to content

SYSTEM Cited by 1 source

Nrtsearch

Definition

Nrtsearch is Yelp's open-source (github.com/Yelp/nrtsearch) Lucene-based search engine, built as a cost-effective, scalable alternative to Elasticsearch. Nrtsearch has been in production at Yelp for 4+ years and, as of the 2025-05-08 Nrtsearch 1.0.0 release post, serves >90% of Yelp's former Elasticsearch traffic. Architecturally it is a primary-replica system over Apache Lucene, with an Nrtsearch Coordinator that routes write/read traffic to the right per-shard cluster and scatter-gathers search across shards.

Architecture (1.0.0)

  • Primary — accepts index / commit / delete writes; writes Lucene segments to local SSD; on every commit, diffs local vs. S3 and incrementally uploads only new segment files. Commit durability lives in S3, not on primary disk.
  • Replicas — bootstrap by downloading the current S3 segment set in parallel (5× faster than the previous EBS-volume-attach path), then follow the primary via gRPC-based NRT updates.
  • Coordinator — gateway that forwards index/commit/delete from indexing clients to the right primary and scatter-gathers search across per-shard clusters when a virtually-sharded index spans multiple clusters.
  • S3 — canonical store of committed segments and (1.0.0+) cluster/index state.
  • State store — cluster and index state is tracked in an immutable object in S3 (or EBS), atomically swapped on state change; replicas hot-reload state changes instead of restarting.

Evolution: the architectural unlock

The 1.0.0 release is organised around one load-bearing change and its follow-ons:

1. Incremental backup on commit → primary on local SSD

Before 1.0: primary wrote segments to an EBS volume that was the source of truth; backup to S3 ran periodically as an archive. Drawbacks: EBS loss = full reindex; EBS mount transitions during primary moves "not as smooth as expected"; ingestion-heavy clusters needed frequent backups so replicas didn't spend too long catching up after bootstrap.

The 1.0 change: on every commit, check S3 for which segment files exist, upload the missing ones. Added commit cost "a few ms to 20 seconds depending on the size of the data" — trivial. Because Lucene segments are immutable, individual files don't need rewriting. This is the canonical patterns/incremental-s3-backup-of-immutable-files instance on the wiki.

Consequence: S3 now has every committed segment, so the primary's disk can be ephemeral local SSD (local SSD vs EBS) instead of EBS. Primary restart no longer risks losing changes between last backup and last commit.

2. Parallel S3 download for bootstrap

Replicas previously serially downloaded an index archive; now they pull multiple segment files from S3 concurrently, landing on local SSD. Yelp reports a 5× download speedup. Canonical patterns/parallel-s3-download-for-bootstrap wiki instance.

3. Full consistent snapshots relocated off primary

Full backups (used in disaster recovery) are now direct S3-to-S3 copies of the committed-segment set, not primary-originated. Because they're no longer on the replica-bootstrap path, they can run less frequently than before.

4. State-management overhaul

Before 1.0, state changes were batched into the data-commit path: apply → commit → backup → restart replicas. Drawbacks were structural: state commit coupled to data commit (no reason to couple — index-field additions are backwards-compatible with existing data); state could be lost on primary restart between modification and commit; primary disk was state's source of truth, which made tooling hard when the primary was unavailable; state propagation required a replica restart.

Redesign:

  • Immutable state object — state changes merge into a fresh immutable representation, committed to the backend, then atomically swapped in as the current reference. Client requests see a consistent state for their whole lifetime.
  • Decoupled state commit — state changes commit per-request; data commit is separate. Schema updates are no longer held hostage by data-commit cadence.
  • State source of truth on S3 (configurable between EBS and S3).
  • Hot reload on replicas — state changes propagate without restart, dropping cluster-wide schema-change time dramatically.
  • Cluster state self-starts indices — an index's started state and a unique per-index identifier are tracked in cluster state, so servers know which indices to start at bootstrap without the deployment manager / Kubernetes operator telling them.

5. Lucene 10 + Java 21

Upgrade from Lucene 8.4.0 → 10.1.0, plus Java 21. Unlocks:

  • HNSW vector search (systems/hnsw / concepts/hnsw-index) for float + byte vectors up to 4096 elements, with cosine (auto-normalized optional), dot product, euclidean, and maximum inner product similarities.
  • Scalar quantization (concepts/scalar-quantization) — accuracy/memory tradeoff for float vectors.
  • Vector search in nested documents.
  • Intra-merge parallelism for faster HNSW-graph merging.
  • SIMD vector instructions (concepts/simd-vectorization) via Java 21's Vector API + foreign-memory API for kernel acceleration.
  • Intra-single-segment parallel search (Lucene 10 feature) — named as a future replacement for virtual sharding.

Lucene facets process results single-threaded after parallel collection finishes — noticeable tail-latency hit for large indices with complex / nested aggregations. Yelp's replacement tracks aggregation state per index slice during collection, then merges slice states recursively at the reduce step — parallel the whole way down. Supported at 1.0: term (text + numeric), filter, top-hits, min, max. Nested aggregations merge recursively.

7. More plugin types exposed

Script, Rescorer, Highlight, Hits Logger (for training-data collection), Fetch Task, Aggregation. Pattern: Nrtsearch ships a compact core; application-specific custom scoring, result enrichment, hit-logging, etc. ship as user plugins rather than forks.

Coordinator & virtual sharding

For large indices, Nrtsearch splits the index into multiple clusters (virtual sharding) and puts the Nrtsearch Coordinator in front:

  • forwards index/commit/delete from indexing clients to the right primary(s);
  • scatter-gather over per-shard clusters at search time.

Yelp documents the coordinator pattern in the 2023 "Coordinator — The Gateway For Nrtsearch" post (not yet ingested). The 1.0 post names virtual sharding as a workaround for Lucene's former lack of intra-segment parallelism, and points to Lucene 10's intra-single-segment parallel search as the planned in-process replacement.

Roadmap (named in the 1.0 post)

  • NRT replication via S3 instead of gRPC — current gRPC pull makes the primary a scale bottleneck for replica count. S3-based replication lets replicas scale independently.
  • Replace virtual sharding with intra-single-segment parallel search — Lucene 10 provides it; the multi-cluster coordinator scatter-gather becomes unnecessary for single-cluster-fit indices.

Caveats

  • No QPS / latency / fleet-size disclosure in the 1.0 post. The pre-existing 2021 launch post has the canonical quantitative shape.
  • Vector-search refresh semantics unspecified — how Yelp handles HNSW's periodic- rebuild need isn't disclosed. Lucene segments being immutable means new docs land in new segments with new HNSW graphs, composing naturally with incremental backup, but merge-time cost for large vector graphs isn't quantified.
  • gRPC replication still in use — the S3-based replication is roadmap, not shipped in 1.0.

Seen in

Last updated · 550 distilled / 1,221 read