Skip to content

PATTERN Cited by 1 source

AST-based query generation

When a user-facing query language must support recursive or nested logical combinations (parenthesised AND/OR/NOT), the production-scale shape for compiling it into a backend query document is:

parse → AST → recursive-traversal emission → backend query

The AST is the load-bearing IR. A flat list IR supports only implicit-AND single-level queries; extending it with ad-hoc special cases (e.g. comma-OR on one field) buys short-term wins but cannot generalize to arbitrary nesting across all fields without structural re-representation.

This pattern is the structural refactoring move that makes nested query languages shippable in production.

Shape

  1. Parse the user input through a grammar that emits a tree IR.
  2. Grammar class choice: PEG is a common fit for Ruby / Python / JS codebases (parslet, TatSu, pegjs); LALR / earley are common for larger languages.
  3. Backward-compat trick: have the grammar accept both the legacy flat form and the new nested form as distinct entry rules. Every historical query must parse; tests should enforce this on the corpus of production queries.
  4. Lower the parse tree to an AST whose node types are aligned with the target backend's logical operators (typically AND / OR / NOT at interior, filter-terms at leaves).
  5. Recursively traverse the AST; each node type emits a matching piece of the backend query document.
  6. Interior AND/OR/NOT map to the backend's AND/OR/NOT shape — for Elasticsearch: must / should / should_not (bool query).
  7. Leaves reuse the existing per-filter-term query builder snippets from the legacy system; this is where the old code pays for itself in the new architecture.
  8. Intra-leaf optimisations: same-field OR-of-values subtrees collapse to a single terms clause; redundant negations flatten; constant-folding on trivially-true/false subtrees. These are optional; they're the compiler- engineering polish stage of the rewrite.
  9. Execute the emitted query document against the search engine and downstream-map results as before. The post- execution stages (result normalisation, result-set post-filtering for deleted records) usually don't need to change — they're a different layer.

Why the traversal is recursive

A flat or iterator-style emitter can't straightforwardly emit a nested query document: the emission at each node depends on the emitted form of its children. Recursion is the natural structural fit; in a stack-aware runtime you'd use a worklist + stack, but in most host languages (Ruby, Python, JS, Java, Kotlin, Go) recursive traversal is fine up to the grammar's depth cap.

Depth caps are a product + correctness concern

  • Product cap: user-interviewed sweet spot. GitHub Issues: five levels of nesting (from customer interviews).
  • Correctness cap: backend query-document size limits (Elasticsearch default indices.query.bool.max_clause_count is 1,024).
  • Host stack cap: recursive traversal depth; typically far larger than product caps, so rarely binds.

Any deep-nested user input should reject at the grammar level with a clean syntax error, not blow out downstream limits.

Shared dependencies

When this is overkill

  • Truly flat query surfaces with a closed set of top-level parameters and no foreseeable nesting — a struct / hashmap IR suffices.
  • Small internal-only DSLs where the grammar + AST layer is more code than the direct emitter it replaces.
  • Single-consumer one-off query builders (e.g., an internal dashboard with a stable, small filter-set).

Canonical instance

GitHub Issues search — 2025 rewrite from IssuesQuery (flat list IR, linear filter-class mapping) to ConditionalIssuesQuery (PEG grammar, AST, recursive traversal → bool query). Preserved backward compatibility with a decade of bookmarked URLs + ~2,000 QPS of live traffic without a visible regression. Legacy flat-query syntax and new nested syntax are both accepted by the same grammar as different entry rules; the AST shape is the same modulo depth. (Source: sources/2025-05-13-github-github-issues-search-now-supports-nested-queries-and-boolean)

Seen in

Last updated · 200 distilled / 1,178 read