Skip to content

SYSTEM Cited by 5 sources

Elasticsearch

Elasticsearch is the Apache-Lucene-based distributed search engine that powers a large share of production full-text + filtered search at scale. It exposes a JSON-based Query DSL; the query document family relevant to most structured search products is the bool query, which takes nested must / should / must_not / filter / should_not clauses corresponding naturally to AND / OR / NOT logic.

Within the wiki this page is a stub created for cross-referencing from sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean; Elasticsearch is a large product with many capabilities not covered here (search relevance, aggregations, k-NN vector search, ILM / snapshotting, cross-cluster replication).

AWS's managed fork is Amazon OpenSearch Service, which uses the same bool-query shape.

Role in the wiki

Backing store for GitHub Issues search (2025-05-13)

GitHub Issues search is backed by Elasticsearch. The 2025 rewrite's Query pipeline stage compiles an AST from user search input into a nested Elasticsearch bool query:

AST node Elasticsearch bool clause
AND must
OR should
NOT should_not (or must_not)
leaf filter-term (author:monalisa) term / terms / prefix

The recursive mapping is the natural codomain for an AST-driven search DSL: patterns/ast-based-query-generation is the structural fit. A worked before-after is in the source page. Same-field OR-of-values subtrees get compacted into a single terms clause as an intra- leaf optimization.

Scale: GitHub Issues search runs at ~2,000 QPS (≈160 M queries/day) on this substrate. (Source: sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean)

Bool query shape (the relevant API surface)

The canonical nested shape Elasticsearch exposes:

{
  "query": {
    "bool": {
      "must":     [ ... ],   // AND: all must match
      "should":   [ ... ],   // OR-like: used for scoring, or for match when no must
      "must_not": [ ... ],   // NOT: none must match
      "filter":   [ ... ]    // AND with no scoring contribution
    }
  }
}

bool clauses nest inside each other, which is why any boolean-algebra AST can be emitted mechanically as a tree of bool objects with leaf clauses at the bottom.

The filter vs must distinction matters for relevance scoring: filter short-circuits the score computation, which is the right choice for structured-equality predicates (state:open, author_id:X). Full-text term queries typically go in must to participate in scoring.

See the upstream reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html

Cross Cluster Replication (CCR) — replication between clusters

Elasticsearch supports two distinct replication mechanisms:

  • Intra-cluster primary/replica shard replication — primary and replica shards of the same index live in one cluster; ES rebalances shards across the cluster's nodes as a health action.
  • Cross Cluster Replication (CCR) — one-way leader→follower replication between otherwise-independent ES clusters, at the Lucene segment granularity. Covered fully at concepts/cross-cluster-replication.

CCR's structural win is that it lets you align the storage topology to the application's primary/replica topology (see concepts/primary-replica-topology-alignment). The canonical wiki instance is the 2026-03-03 GHES search rewrite — GHES collapsed a multi-node ES cluster spanning its HA pair into per-node single-node clusters and joined them with CCR, removing ES's freedom to rebalance primary shards onto the read-only replica host (the old failure mode that caused mutual-dependency deadlocks). See patterns/single-node-cluster-per-app-replica.

CCR's auto-follow policy is new-only — it matches indexes created after the policy is installed and doesn't retroactively attach pre-existing indexes. Applying CCR to a long-lived deployment therefore requires an imperative bootstrap step for pre-existing indexes followed by the declarative auto-follow policy for future ones.

CCR only covers document replication. Failover, index deletion, and upgrades are the consumer's responsibility — "Elasticsearch only handles the document replication, and we're responsible for the rest of the index's lifecycle" (GitHub, 2026-03-03).

Stub caveats

  • This page covers only what the ingested sources touch. Not covered: relevance/_score tuning, aggregations, ILM, k-NN vector search, Elasticsearch SQL, cross-cluster replication, snapshot lifecycle, or operational runbooks (shard sizing, circuit breakers, mapping explosion).
  • The open-source / licensing split between Elasticsearch (Elastic) and OpenSearch (AWS fork) is not modelled here; the bool-query DSL is common to both.

Seen in

Seen in (legacy / archetypal side)

  • sources/2025-10-12-mongodb-cars24-improves-search-for-300-million-users-with-atlas — post names "bolt-on search engine (such as Elasticsearch)" as the canonical example of the legacy search shape Cars24 left to consolidate on Atlas + Atlas Search. Cars24 had multiple engineering teams piping data into a single search index with race-logic + real-time-dashboard-update inefficiencies. The class is archetypal, not Cars24-specific; the wiki treats this as one instance of the synchronization-tax shape.
Last updated · 200 distilled / 1,178 read