SYSTEM Cited by 5 sources
Elasticsearch¶
Elasticsearch is the Apache-Lucene-based distributed search
engine that powers a large share of production full-text + filtered
search at scale. It exposes a JSON-based Query DSL; the query
document family relevant to most structured search products is the
bool query, which takes nested must / should / must_not /
filter / should_not clauses corresponding naturally to
AND / OR / NOT logic.
Within the wiki this page is a stub created for cross-referencing from sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean; Elasticsearch is a large product with many capabilities not covered here (search relevance, aggregations, k-NN vector search, ILM / snapshotting, cross-cluster replication).
AWS's managed fork is Amazon OpenSearch Service, which uses the same bool-query shape.
Role in the wiki¶
Backing store for GitHub Issues search (2025-05-13)¶
GitHub Issues search is backed by Elasticsearch. The 2025 rewrite's Query pipeline stage compiles an AST from user search input into a nested Elasticsearch bool query:
| AST node | Elasticsearch bool clause |
|---|---|
AND |
must |
OR |
should |
NOT |
should_not (or must_not) |
leaf filter-term (author:monalisa) |
term / terms / prefix |
The recursive mapping is the natural codomain for an AST-driven
search DSL:
patterns/ast-based-query-generation is the structural fit. A
worked before-after is in the source page. Same-field OR-of-values
subtrees get compacted into a single terms clause as an intra-
leaf optimization.
Scale: GitHub Issues search runs at ~2,000 QPS (≈160 M queries/day) on this substrate. (Source: sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean)
Bool query shape (the relevant API surface)¶
The canonical nested shape Elasticsearch exposes:
{
"query": {
"bool": {
"must": [ ... ], // AND: all must match
"should": [ ... ], // OR-like: used for scoring, or for match when no must
"must_not": [ ... ], // NOT: none must match
"filter": [ ... ] // AND with no scoring contribution
}
}
}
bool clauses nest inside each other, which is why any
boolean-algebra AST can be emitted mechanically as a tree of bool
objects with leaf clauses at the bottom.
The filter vs must distinction matters for relevance scoring:
filter short-circuits the score computation, which is the right
choice for structured-equality predicates (state:open,
author_id:X). Full-text term queries typically go in must to
participate in scoring.
See the upstream reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
Cross Cluster Replication (CCR) — replication between clusters¶
Elasticsearch supports two distinct replication mechanisms:
- Intra-cluster primary/replica shard replication — primary and replica shards of the same index live in one cluster; ES rebalances shards across the cluster's nodes as a health action.
- Cross Cluster Replication (CCR) — one-way leader→follower replication between otherwise-independent ES clusters, at the Lucene segment granularity. Covered fully at concepts/cross-cluster-replication.
CCR's structural win is that it lets you align the storage topology to the application's primary/replica topology (see concepts/primary-replica-topology-alignment). The canonical wiki instance is the 2026-03-03 GHES search rewrite — GHES collapsed a multi-node ES cluster spanning its HA pair into per-node single-node clusters and joined them with CCR, removing ES's freedom to rebalance primary shards onto the read-only replica host (the old failure mode that caused mutual-dependency deadlocks). See patterns/single-node-cluster-per-app-replica.
CCR's auto-follow policy is new-only — it matches indexes created after the policy is installed and doesn't retroactively attach pre-existing indexes. Applying CCR to a long-lived deployment therefore requires an imperative bootstrap step for pre-existing indexes followed by the declarative auto-follow policy for future ones.
CCR only covers document replication. Failover, index deletion, and upgrades are the consumer's responsibility — "Elasticsearch only handles the document replication, and we're responsible for the rest of the index's lifecycle" (GitHub, 2026-03-03).
Stub caveats¶
- This page covers only what the ingested sources touch. Not
covered: relevance/
_scoretuning, aggregations, ILM, k-NN vector search, Elasticsearch SQL, cross-cluster replication, snapshot lifecycle, or operational runbooks (shard sizing, circuit breakers, mapping explosion). - The open-source / licensing split between Elasticsearch (Elastic) and OpenSearch (AWS fork) is not modelled here; the bool-query DSL is common to both.
Seen in¶
- sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean — backing engine for GitHub Issues search at ~2 kQPS; bool-query shape as codomain for the AST-based query builder.
- sources/2026-04-21-figma-the-search-for-speed-in-figma — Figma ran on Elasticsearch until late 2023 then migrated to managed OpenSearch. The post notes that while the two remain "mostly compatible, small differences have accumulated over the last three years, making the migration more challenging than expected" — a concrete data point on the cost of a drift-after-fork that is cited but not enumerated in this post.
- sources/2026-03-03-github-how-we-rebuilt-the-search-architecture-for-high-availability — GHES 3.19.1 uses Elasticsearch CCR to replicate Lucene segments between per-node single-node clusters, replacing a failure-prone cross-node-cluster topology. Canonical wiki instance of the cross-cluster-replication primitive and of patterns/single-node-cluster-per-app-replica. Also the wiki's first real engagement with CCR's auto-follow new-only limitation and the patterns/bootstrap-then-auto-follow workaround.
- sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform — Elasticsearch is the original destination system of Datadog's managed multi-tenant CDC replication platform. Datadog's Metrics Summary page was joining 82K × 817K rows on a shared Postgres, hitting p90 ~7 s. Rerouting search + faceted filtering to Elasticsearch, populated by a Debezium → Kafka → sink-connector pipeline with dynamic denormalisation at replication time, dropped page load ~30 s → ~1 s (up to 97%) at ~500 ms replication lag. Canonical wiki instance of the split direction of the database-and-search problem (Elasticsearch beside Postgres fed by CDC), as distinct from Cars24's Atlas-Search consolidation.
Related¶
- systems/lucene — the underlying storage/indexing engine; CCR replicates at the Lucene-segment grain.
- systems/github-enterprise-server — canonical CCR consumer.
- systems/mongodb-atlas — competing consolidated primary-plus- embedded-search substrate (Atlas + Atlas Search); Cars24's target of the bolt-on-search elimination.
-
systems/atlas-hybrid-search — MongoDB's BM25-on-Lucene + vector peer; the consolidation destination Cars24 chose to replace an Elasticsearch-class bolt-on.
-
systems/amazon-opensearch-service — AWS's managed fork, same bool-query shape.
- concepts/boolean-query-dsl — the AND/OR/NOT nested-clause API shape Elasticsearch popularised.
- concepts/query-shape — flat vs nested vs recursive as distinct backend load profiles; Elasticsearch exposes all three.
- patterns/ast-based-query-generation — recursive-AST-to-bool query emission.
- concepts/cross-cluster-replication — CCR concept page.
- concepts/primary-replica-topology-alignment — structural principle the GHES rewrite exemplifies on top of CCR.
- patterns/single-node-cluster-per-app-replica — deployment pattern that collapses a multi-node ES cluster into per-host single-node clusters joined by CCR.
- patterns/bootstrap-then-auto-follow — imperative-then- declarative pattern for new-only CCR auto-follow policies.
- concepts/synchronization-tax — cost class Elasticsearch-as- bolt-on-search beside a primary RDBMS exemplifies; Cars24 (2025-10-12) is the canonical wiki instance of leaving that shape.
- patterns/consolidate-database-and-search — the remediation pattern that obsoletes the Elasticsearch-beside-Postgres shape.
Seen in (legacy / archetypal side)¶
- sources/2025-10-12-mongodb-cars24-improves-search-for-300-million-users-with-atlas — post names "bolt-on search engine (such as Elasticsearch)" as the canonical example of the legacy search shape Cars24 left to consolidate on Atlas + Atlas Search. Cars24 had multiple engineering teams piping data into a single search index with race-logic + real-time-dashboard-update inefficiencies. The class is archetypal, not Cars24-specific; the wiki treats this as one instance of the synchronization-tax shape.