CONCEPT Cited by 1 source

Boolean query DSL¶

A boolean query DSL is a structured query-document shape in which clauses combine via explicit AND / OR / NOT operators — usually rendered as nested tree objects — so that an upstream parser/compiler can emit arbitrarily-deep logical combinations of filter terms.

The canonical example is Elasticsearch's bool query, but the shape is generic and appears across most production search and query engines (OpenSearch, Solr, Vespa, MongoDB's $and / $or, SQL WHERE ASTs internally, etc.).

The canonical Elasticsearch shape¶

{
  "query": {
    "bool": {
      "must":     [ ... ],   // AND (participates in scoring)
      "should":   [ ... ],   // OR (or scoring boost)
      "must_not": [ ... ],   // NOT (eliminates matches)
      "filter":   [ ... ]    // AND without scoring contribution
    }
  }
}

Nested bool clauses inside any of must/should/must_not/filter give you recursive boolean combinations of any depth. This is the property that makes the DSL a natural emission target for an AST traversal.

`must` vs `filter`¶

Both AND-join clauses. The difference is relevance scoring: - must clauses contribute to _score (used to rank results). - filter clauses short-circuit the scoring computation — used for structured-equality predicates (state:open, author_id:X) where ranking contribution is not wanted.

For UI-driven search like GitHub Issues, filter predicates dominate, so the AST-to-bool-query compiler can default to filter / must_not and only use must when a full-text term is present in the input.

The AST-to-bool-query mapping¶

The rewrite of a boolean-DSL-compatible AST into an Elasticsearch bool-query document is essentially one-to-one on operators:

AST node	Elasticsearch bool clause
`AND(a, b)`	`{bool: {must: [<a>, <b>]}}` (or `filter` if non-scoring)
`OR(a, b)`	`{bool: {should: [<a>, <b>], minimum_should_match: 1}}`
`NOT(a)`	`{bool: {must_not: [<a>]}}`
leaf filter term `K:V`	backend-specific term/terms/prefix clause

One intra-leaf optimisation that real compilers make: same-field OR-of-values (e.g. author:A OR author:B OR author:C) collapses to a single terms clause:

{ "terms": { "author_id": ["A_ID", "B_ID", "C_ID"] } }

instead of three nested should clauses. GitHub's rewrite ships this optimisation. (Source: sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean)

Why the boolean DSL is the "right" codomain¶

Composability: any boolean expression, however deeply nested, has a bool query emission of the same depth.
Scoring-aware: must vs filter separation maps cleanly to "does this term influence ranking?" — a question every search product has to answer.
Backend-neutral enough: the shape is idiomatic in Elasticsearch, OpenSearch, and other derivatives; porting an AST-emitter across them is mostly leaf-clause adjustments.

Caveats and limits¶

Implicit scoring contribution: when all clauses are must, the Elasticsearch scorer combines sub-scores in ways that can surprise product owners. Force filter on equality predicates unless scoring is explicitly desired.
Depth limits: Elasticsearch caps boolean nesting at a configurable limit (indices.query.bool.max_clause_count, default 1,024 leaves). Product-level caps (GitHub Issues' 5 levels) usually intersect this first, but for machine- generated queries the backend limit is real.
should with minimum_should_match requires care: a bool clause that has only should children defaults to OR-any (minimum 1); a bool clause that mixes must and should makes should contribute to scoring but not matching. AST compilers should emit minimum_should_match: 1 explicitly whenever the AST says OR.
The DSL is not SQL. Correlated subqueries, joins across indices, aggregations as filters — none of these map cleanly through bool. Once a query language wants those, a different codomain (or a layer above Elasticsearch) is needed.

Seen in¶

sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean — GitHub's Issues-search rewrite compiles its AST into Elasticsearch bool-query documents, with AND→must, OR→should, NOT→should_not, and same-field OR-of-values compacting to terms. Worked example in the source page.

systems/elasticsearch — the canonical engine exposing this DSL.
systems/amazon-opensearch-service — AWS's fork; same shape.
concepts/abstract-syntax-tree — the upstream IR.
patterns/ast-based-query-generation — the end-to-end compiler shape.
concepts/query-shape — flat vs nested vs recursive query profiles at the backend.