Skip to content

CONCEPT Cited by 1 source

Boolean query DSL

A boolean query DSL is a structured query-document shape in which clauses combine via explicit AND / OR / NOT operators — usually rendered as nested tree objects — so that an upstream parser/compiler can emit arbitrarily-deep logical combinations of filter terms.

The canonical example is Elasticsearch's bool query, but the shape is generic and appears across most production search and query engines (OpenSearch, Solr, Vespa, MongoDB's $and / $or, SQL WHERE ASTs internally, etc.).

The canonical Elasticsearch shape

{
  "query": {
    "bool": {
      "must":     [ ... ],   // AND (participates in scoring)
      "should":   [ ... ],   // OR (or scoring boost)
      "must_not": [ ... ],   // NOT (eliminates matches)
      "filter":   [ ... ]    // AND without scoring contribution
    }
  }
}

Nested bool clauses inside any of must/should/must_not/filter give you recursive boolean combinations of any depth. This is the property that makes the DSL a natural emission target for an AST traversal.

must vs filter

Both AND-join clauses. The difference is relevance scoring: - must clauses contribute to _score (used to rank results). - filter clauses short-circuit the scoring computation — used for structured-equality predicates (state:open, author_id:X) where ranking contribution is not wanted.

For UI-driven search like GitHub Issues, filter predicates dominate, so the AST-to-bool-query compiler can default to filter / must_not and only use must when a full-text term is present in the input.

The AST-to-bool-query mapping

The rewrite of a boolean-DSL-compatible AST into an Elasticsearch bool-query document is essentially one-to-one on operators:

AST node Elasticsearch bool clause
AND(a, b) {bool: {must: [<a>, <b>]}} (or filter if non-scoring)
OR(a, b) {bool: {should: [<a>, <b>], minimum_should_match: 1}}
NOT(a) {bool: {must_not: [<a>]}}
leaf filter term K:V backend-specific term/terms/prefix clause

One intra-leaf optimisation that real compilers make: same-field OR-of-values (e.g. author:A OR author:B OR author:C) collapses to a single terms clause:

{ "terms": { "author_id": ["A_ID", "B_ID", "C_ID"] } }

instead of three nested should clauses. GitHub's rewrite ships this optimisation. (Source: sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean)

Why the boolean DSL is the "right" codomain

  • Composability: any boolean expression, however deeply nested, has a bool query emission of the same depth.
  • Scoring-aware: must vs filter separation maps cleanly to "does this term influence ranking?" — a question every search product has to answer.
  • Backend-neutral enough: the shape is idiomatic in Elasticsearch, OpenSearch, and other derivatives; porting an AST-emitter across them is mostly leaf-clause adjustments.

Caveats and limits

  • Implicit scoring contribution: when all clauses are must, the Elasticsearch scorer combines sub-scores in ways that can surprise product owners. Force filter on equality predicates unless scoring is explicitly desired.
  • Depth limits: Elasticsearch caps boolean nesting at a configurable limit (indices.query.bool.max_clause_count, default 1,024 leaves). Product-level caps (GitHub Issues' 5 levels) usually intersect this first, but for machine- generated queries the backend limit is real.
  • should with minimum_should_match requires care: a bool clause that has only should children defaults to OR-any (minimum 1); a bool clause that mixes must and should makes should contribute to scoring but not matching. AST compilers should emit minimum_should_match: 1 explicitly whenever the AST says OR.
  • The DSL is not SQL. Correlated subqueries, joins across indices, aggregations as filters — none of these map cleanly through bool. Once a query language wants those, a different codomain (or a layer above Elasticsearch) is needed.

Seen in

Last updated · 200 distilled / 1,178 read