PATTERN Cited by 1 source
AST-based query generation¶
When a user-facing query language must support recursive or nested logical combinations (parenthesised AND/OR/NOT), the production-scale shape for compiling it into a backend query document is:
parse → AST → recursive-traversal emission → backend query
The AST is the load-bearing IR. A flat list IR supports only implicit-AND single-level queries; extending it with ad-hoc special cases (e.g. comma-OR on one field) buys short-term wins but cannot generalize to arbitrary nesting across all fields without structural re-representation.
This pattern is the structural refactoring move that makes nested query languages shippable in production.
Shape¶
- Parse the user input through a grammar that emits a tree IR.
- Grammar class choice: PEG is a common fit for Ruby / Python / JS codebases (parslet, TatSu, pegjs); LALR / earley are common for larger languages.
- Backward-compat trick: have the grammar accept both the legacy flat form and the new nested form as distinct entry rules. Every historical query must parse; tests should enforce this on the corpus of production queries.
- Lower the parse tree to an AST whose node types are aligned with the target backend's logical operators (typically AND / OR / NOT at interior, filter-terms at leaves).
- Recursively traverse the AST; each node type emits a matching piece of the backend query document.
- Interior AND/OR/NOT map to the backend's AND/OR/NOT
shape — for Elasticsearch:
must/should/should_not(bool query). - Leaves reuse the existing per-filter-term query builder snippets from the legacy system; this is where the old code pays for itself in the new architecture.
- Intra-leaf optimisations: same-field OR-of-values
subtrees collapse to a single
termsclause; redundant negations flatten; constant-folding on trivially-true/false subtrees. These are optional; they're the compiler- engineering polish stage of the rewrite. - Execute the emitted query document against the search engine and downstream-map results as before. The post- execution stages (result normalisation, result-set post-filtering for deleted records) usually don't need to change — they're a different layer.
Why the traversal is recursive¶
A flat or iterator-style emitter can't straightforwardly emit a nested query document: the emission at each node depends on the emitted form of its children. Recursion is the natural structural fit; in a stack-aware runtime you'd use a worklist + stack, but in most host languages (Ruby, Python, JS, Java, Kotlin, Go) recursive traversal is fine up to the grammar's depth cap.
Depth caps are a product + correctness concern¶
- Product cap: user-interviewed sweet spot. GitHub Issues: five levels of nesting (from customer interviews).
- Correctness cap: backend query-document size limits
(Elasticsearch default
indices.query.bool.max_clause_countis 1,024). - Host stack cap: recursive traversal depth; typically far larger than product caps, so rarely binds.
Any deep-nested user input should reject at the grammar level with a clean syntax error, not blow out downstream limits.
Shared dependencies¶
- A grammar / parser library (systems/parslet, TatSu, pegjs, lark, ANTLR, handwritten)
- A backend with a recursive query DSL
(systems/elasticsearch / systems/amazon-opensearch-service /
SQL / Cypher / MongoDB
$and/$or) - A validation harness that covers all three parity layers: grammar inclusion of legacy queries, unit-test re-run under both feature-flag states, and dark-ship diff on live traffic
When this is overkill¶
- Truly flat query surfaces with a closed set of top-level parameters and no foreseeable nesting — a struct / hashmap IR suffices.
- Small internal-only DSLs where the grammar + AST layer is more code than the direct emitter it replaces.
- Single-consumer one-off query builders (e.g., an internal dashboard with a stable, small filter-set).
Canonical instance¶
GitHub Issues search — 2025 rewrite from IssuesQuery
(flat list IR, linear filter-class mapping) to
ConditionalIssuesQuery (PEG grammar, AST, recursive
traversal → bool query). Preserved backward compatibility with
a decade of bookmarked URLs + ~2,000 QPS of live traffic
without a visible regression. Legacy flat-query syntax and new
nested syntax are both accepted by the same grammar as
different entry rules; the AST shape is the same modulo depth.
(Source: sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean)
Seen in¶
- sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean — canonical in-wiki instance. GitHub Issues rewrite's Parse+Query stages use systems/parslet PEG parser → AST → recursive traversal → systems/elasticsearch bool query. Worked example on AST and emitted query document in source page.
Related¶
- concepts/abstract-syntax-tree — the IR.
- concepts/peg-grammar — the grammar class the canonical instance uses.
- concepts/boolean-query-dsl — the emission codomain.
- concepts/backward-compatibility — the constraint that usually forces the layered validation harness around this pattern.
- patterns/dark-ship-for-behavior-parity — the live-traffic behaviour-parity check that pairs with this rewrite pattern.
- systems/github-issues — canonical user.
- systems/parslet — the grammar library used in the canonical instance.
- systems/elasticsearch — the backend.