Skip to content

CONCEPT Cited by 1 source

Abstract Syntax Tree

An Abstract Syntax Tree (AST) is an intermediate representation (IR) of a program, query, or expression, structured as a tree whose interior nodes are operators / productions and whose leaves are atomic tokens (literals, identifiers, filter terms). The tree abstracts the source — whitespace, punctuation, parse-stage-only scaffolding are dropped; only the semantic structure remains.

ASTs are the standard IR for any tool that must recurse over the input to emit something else: compilers, linters, refactoring tools, transpilers, and — the focus of this concept page — query languages that compile to a backend query document.

Why the AST is load-bearing for query-language rewrites

For a flat query language — e.g. assignee:@me label:support new-project where every term is implicitly AND-joined — a list is a sufficient IR. Each term maps independently to a backend clause; the backend combines them linearly.

Once the language grows nesting (parentheses, arbitrary operators across fields), the list breaks. You need a structure that can represent a tree of operators with subtrees as operands. The AST is that structure.

The sequence GitHub Issues went through is the standard story (Source: sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean):

  1. Original: flat parser emits a list → linear map to Elasticsearch snippets → concatenate.
  2. First-gen upgrade (2021): comma-OR on label: only, shoehorned onto the list representation — works for one field, can't generalize.
  3. 2025 rewrite: parse to AST → recursive traversal → emit nested Elasticsearch bool query. Both legacy flat queries and new nested queries become valid trees (legacy queries become left-leaning AND-spine trees) so the backward-compat story is "same IR, different depths" rather than "two code paths".

Worked example

Input:

is:issue AND (author:deborah-digges OR author:monalisa)

AST (simplified shape — captures and strips parens):

root
 └── and
      ├── filter_term(is, issue)          # left
      └── or                                # right
           ├── filter_term(author, deborah-digges)
           └── filter_term(author, monalisa)

A recursive walker visiting this AST emits one bool-query clause per node: - and{bool: {must: [<left>, <right>]}} - or{bool: {should: [<left>, <right>]}} - leaf filter_term(K, V) → backend-specific clause

See patterns/ast-based-query-generation for the pattern.

AST vs parse tree

  • Parse tree (concrete syntax tree, CST) contains every rule match the grammar makes, including punctuation nodes, whitespace nodes, list-of-one-element nodes, etc.
  • AST discards parse-only scaffolding and keeps only the semantically meaningful nodes.

In parslet, the raw grammar emits a parse tree (nested Hash of Slices); a Transform rewrites it to the AST. Some parsing libraries collapse both steps into one.

When an AST is not warranted

  • Truly flat DSLs (e.g., tag filters with no nesting, single-operator filters): a list/struct is enough.
  • Single-use query builders where the query text maps 1:1 to backend calls by string substitution: skipping the AST saves a layer of indirection.
  • Small vocabularies where the language can never grow — e.g. a URL-parameter-driven search where every recognised parameter is a top-level field. The moment a ( shows up in a feature request, this judgement has to be revisited.

The GitHub Issues 2021 comma-OR delivery is a case study in the second answer being correct at the time and incorrect four years later: the rewrite became inevitable when the syntax had to nest, but the interim delivery was not wasted.

Seen in

Last updated · 200 distilled / 1,178 read