CONCEPT Cited by 1 source
Abstract Syntax Tree¶
An Abstract Syntax Tree (AST) is an intermediate representation (IR) of a program, query, or expression, structured as a tree whose interior nodes are operators / productions and whose leaves are atomic tokens (literals, identifiers, filter terms). The tree abstracts the source — whitespace, punctuation, parse-stage-only scaffolding are dropped; only the semantic structure remains.
ASTs are the standard IR for any tool that must recurse over the input to emit something else: compilers, linters, refactoring tools, transpilers, and — the focus of this concept page — query languages that compile to a backend query document.
Why the AST is load-bearing for query-language rewrites¶
For a flat query language — e.g. assignee:@me label:support
new-project where every term is implicitly AND-joined — a list
is a sufficient IR. Each term maps independently to a backend
clause; the backend combines them linearly.
Once the language grows nesting (parentheses, arbitrary operators across fields), the list breaks. You need a structure that can represent a tree of operators with subtrees as operands. The AST is that structure.
The sequence GitHub Issues went through is the standard story (Source: sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean):
- Original: flat parser emits a list → linear map to Elasticsearch snippets → concatenate.
- First-gen upgrade (2021): comma-OR on
label:only, shoehorned onto the list representation — works for one field, can't generalize. - 2025 rewrite: parse to AST → recursive traversal → emit
nested Elasticsearch
boolquery. Both legacy flat queries and new nested queries become valid trees (legacy queries become left-leaning AND-spine trees) so the backward-compat story is "same IR, different depths" rather than "two code paths".
Worked example¶
Input:
AST (simplified shape — captures and strips parens):
root
└── and
├── filter_term(is, issue) # left
└── or # right
├── filter_term(author, deborah-digges)
└── filter_term(author, monalisa)
A recursive walker visiting this AST emits one bool-query clause
per node:
- and → {bool: {must: [<left>, <right>]}}
- or → {bool: {should: [<left>, <right>]}}
- leaf filter_term(K, V) → backend-specific clause
See patterns/ast-based-query-generation for the pattern.
AST vs parse tree¶
- Parse tree (concrete syntax tree, CST) contains every rule match the grammar makes, including punctuation nodes, whitespace nodes, list-of-one-element nodes, etc.
- AST discards parse-only scaffolding and keeps only the semantically meaningful nodes.
In parslet, the raw grammar emits a parse tree (nested Hash of
Slices); a Transform rewrites it to the AST. Some parsing
libraries collapse both steps into one.
When an AST is not warranted¶
- Truly flat DSLs (e.g., tag filters with no nesting, single-operator filters): a list/struct is enough.
- Single-use query builders where the query text maps 1:1 to backend calls by string substitution: skipping the AST saves a layer of indirection.
- Small vocabularies where the language can never grow — e.g.
a URL-parameter-driven search where every recognised parameter
is a top-level field. The moment a
(shows up in a feature request, this judgement has to be revisited.
The GitHub Issues 2021 comma-OR delivery is a case study in the second answer being correct at the time and incorrect four years later: the rewrite became inevitable when the syntax had to nest, but the interim delivery was not wasted.
Seen in¶
- sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean
— canonical wiki instance. GitHub's Issues-search rewrite
replaces a flat-list IR with an AST emitted by a
PEG grammar via systems/parslet,
then recursively compiles the AST to a nested Elasticsearch
boolquery.
Related¶
- concepts/peg-grammar — grammar class that emits ASTs well.
- patterns/ast-based-query-generation — the pattern that uses this IR in production query-language rewrites.
- systems/parslet — Ruby PEG parser that emits into transforms-to-AST.
- systems/github-issues — canonical user of this shape.