Skip to content

GITHUB 2025-05-13 Tier 2

Read original ↗

GitHub Issues search now supports nested queries and boolean operators: Here's how we (re)built it

Summary

GitHub rewrote Issues search to support logical AND/OR operators and nested parentheses across all fields (e.g. is:issue state:open author:rileybroughten (type:Bug OR type:Epic)), a decade-old community ask that a 2021 comma-OR-on-labels-only enhancement had only partially addressed. The rewrite swapped the existing IssuesQuery module for ConditionalIssuesQuery, split into the same three pipeline stages (Parse → Query → Normalize) but with two stages replaced:

  1. Parse: a flat-list parser became a PEG grammar (via the Ruby parslet library) that emits an Abstract Syntax Tree. The grammar accepts both legacy flat queries and the new nested syntax, so bookmarked URLs keep working.
  2. Query: linear filter-class-per-term mapping became a recursive AST walk that emits an Elasticsearch boolean query, with AND/OR/NOT mapping directly onto Elasticsearch must/should/should_not clauses.

Because Issues search runs at ~2,000 QPS (≈160 M queries/day) and issues URLs are routinely bookmarked, shared, and linked, the rollout emphasized correctness + performance parity over feature delivery. Validation used three layered techniques: (a) the existing unit + integration test suite re-run against the new module behind a feature flag, toggled both on and off to confirm GraphQL/REST API contract invariance; (b) dark-shipping — for 1% of production searches the query ran against both systems in a background job and result-count differences (within ≤1 s of each other) were logged and triaged; (c) performance comparison via systems/scientist, GitHub's open-source library for safely running old-vs-new critical paths side by side. Rollout minimized blast radius by integrating only with the GraphQL API and the per-repo Issues tab first, then extending to the Issues dashboard and REST API after confidence grew. A hard product cap of five nesting levels came out of customer interviews as the utility-vs-usability sweet spot.

Key takeaways

  1. Rewriting query languages into ASTs is the structural move that unlocks nesting. A flat list of filter:value terms implicitly AND-joined has no place to hold a recursive tree; you can bolt comma-OR onto one field (GitHub's 2021 stopgap for label:a,b) but you can't lift it to arbitrary fields without changing the intermediate representation. The 2025 rewrite's first enabling step was replacing the parsed representation with an AST. (Source: this article.) Generalization captured in patterns/ast-based-query-generation.

  2. PEG grammars (via parslet) are the pragmatic Ruby move for expressing a query syntax that has to cover both legacy flat queries and nested boolean queries. The post ships a simplified boolean-algebra example from parslet: and_operation is right-recursive on primary, or_operation is right-recursive on and_operation, primary recurses back into or_operation inside parentheses, and the root rule is the lowest-precedence (or_operation) — the standard operator-precedence encoding in a PEG. Backward compatibility is achieved by having the grammar accept legacy filter-lists as a sub-rule, not by maintaining two parsers. (Source: this article.)

  3. Elasticsearch's bool-query DSL is the natural codomain for a boolean AST. AND → must, OR → should, NOT → should_not — a recursive traversal of the AST builds a matching nested bool query tree, with filter-term leaves (author:monalisa) compiling to Elasticsearch term / terms / prefix clauses. The post shows the before/after: one AST node per boolean operator becomes one nested bool object in the query document. GitHub reuses the smaller-piece query builders from the old system as the leaf case of the recursion. (Source: this article.) Generalization: any system whose backend is systems/elasticsearch or an Elasticsearch-compatible engine (OpenSearch) gets this map essentially for free once it has an AST.

  4. Dark-shipping with result-count deltas is a viable behavior-parity harness at 2 kQPS. For 1% of Issues searches, GitHub ran the user's query through both the old and new modules in a background job (so the user sees only the old result) and logged the diff when the number of results disagreed, conditioned on both runs happening within ≤1 s. The post explicitly flags that picking result count was a first-iteration proxy — not a deep equality check — chosen because users are only "surprised" when the count moves, and because counts are cheap to log. Differences drove bug fixes before GA. (Source: this article.) Generalization: patterns/dark-ship-for-behavior-parity — distinct from data-pipeline shadow migration (which is batch, offline, reconciled as dataset stats) because this runs inside the live request path against a production workload stream.

  5. systems/scientist remains GitHub's load-bearing comparison library for critical-path refactors. The original 2016 open-source tool was used here to compare performance of equivalent queries on a 1% sample; it handles timing, error routing, and diff logging for "running experimental code alongside trusted code." The post names it explicitly: "We used scientist, GitHub's open source Ruby library, for carefully refactoring critical paths." (Source: this article.) patterns/performance-comparison-with-scientist is the generalized pattern.

  6. Blast radius was bounded by API surface, not traffic percent. The new search rolled out first to the GraphQL API and the per-repo Issues tab UI — two surfaces where bookmarked URLs are less common and users tend to re-author queries. Only after confidence built did it extend to the Issues dashboard (2025-04-02 changelog) and the REST API (2025-03-06 changelog). This is a surface-first rollout: limit the set of code paths that can see the new behaviour, not just the fraction of requests. Orthogonal to — and composed with — the feature-flag + percent-of-traffic rollout on each surface. (Source: this article + the linked GitHub changelog posts.)

  7. Five nesting levels is the product cap, derived from customer interviews. Not a technical ceiling (parslet + Elasticsearch can handle more); a product-UX ceiling. "From customer interviews, we found this to be a sweet spot for both utility and usability." Combined with highlighted AND/OR keywords and filter-term auto-complete carried over from the flat-query UI, the cap bounds cognitive load for users writing queries, not just the size of the Elasticsearch query document. (Source: this article.)

  8. Backward-compatibility guarantee was enforced by running the old test suite unchanged against the new module with the flag both on and off. GitHub ran the existing IssuesQuery unit + integration tests against the new ConditionalIssuesQuery module, and ran the search-endpoint API contract tests (GraphQL + REST) both with the feature flag enabled and disabled. A passing matrix in both flag states was the gate. (Source: this article.) This is weaker than a dark-ship diff (which tests live traffic), but stronger than a one-sided test run (which would miss regressions introduced by the flag-on path).

Architecture: the three-stage pipeline

The post draws the pre/post architecture as two block diagrams; the contract is identical, the middle two stages are rewritten.

Stage Before (IssuesQuery) After (ConditionalIssuesQuery)
Parse Flat list of terms+filters PEG grammar → AST (via parslet)
Query Linear map: one filter-class per term → Elasticsearch snippet, concatenated Recursive AST traversal → nested Elasticsearch bool query
Normalize JSON → Ruby; prune records removed from DB Unchanged

The Normalize stage being unchanged is the architectural lesson for future rewrites: find the stage that is genuinely about result-shaping rather than query semantics, and don't touch it.

Operational numbers

  • ~2,000 QPS — average Issues search traffic
  • ~160 M — queries per day (derived: 2,000 × 86,400)
  • 1% — dark-ship sampling rate for diff logging and for the scientist-based performance comparison
  • ≤1 s — max time-skew for a dark-shipped pair of results to be considered "same user intent" when logging count differences
  • 5 — maximum nesting depth supported (product cap, not a technical limit)
  • ~decade — elapsed time from the first community request (isaacs/github#660) to GA (2015-ish → 2025)
  • 2021 — intermediate partial delivery (comma-separated label values as implicit OR)

Worked example: AND/OR/parens → AST → bool query

The post walks one example end-to-end:

Input:

is:issue AND (author:deborah-digges OR author:monalisa)

AST (simplified shape):

root → and {
  left:  filter_term(is, issue)
  right: or {
    left:  filter_term(author, deborah-digges)
    right: filter_term(author, monalisa)
  }
}

Emitted Elasticsearch query (abridged):

{
  "query": { "bool": { "must": [
    { "bool": { "must": [
      { "bool": { "must": { "prefix": { "_index": "issues" } } } },
      { "bool": { "should": { "terms": {
        "author_id": ["<DEBORAH_DIGGES_AUTHOR_ID>", "<MONALISA_AUTHOR_ID>"]
      } } } }
    ]}}
  ]}}
}

Note that author:A OR author:B gets compacted by the query builder into a single terms clause (many-values short-circuit) rather than two nested should clauses — a small intra-leaf optimisation the recursive builder is allowed to make because it sees the AST subtree for "same-field OR of values" as a special case.

Caveats

  • This is a product-launch post, not a postmortem. No regression or rollback anecdote is shared; the narrative reads as "we used the right validation harnesses and nothing went wrong." That's plausible for a 1%-then-surface-by-surface rollout with scientist+dark-shipping diff logging, but the absence of reported incidents does not prove the harness catches everything — only that it caught what shipped.
  • "Number of results" is a weak equivalence metric. A query that returns the same count with a reordered or partially different result set would not flag. GitHub acknowledges this explicitly as a first-iteration choice; more rigorous diffs (top-K identity, set overlap) are not discussed, perhaps because they would cost more in the dark-ship log-and-compare step.
  • Performance baseline details are absent. The article says scientist was used and that more-complex nested queries were expected to use more backend resources, but publishes no QPS headroom, p50/p99 latency numbers, or Elasticsearch CPU/heap before-after. The "no regression" claim is stated rather than shown.
  • Dark-shipping works for read paths; the post doesn't generalize to mutation paths. The Parse→Query→Normalize pipeline is read-only (it doesn't mutate the issues index), so running both modules in a background job is side-effect-free. The same harness on a write-path rewrite would require idempotency or a patterns/dual-write-migration-style setup — not what's described here.
  • The boolean-search feature is partially orthogonal to the rewrite. GitHub could have shipped the boolean UI on top of the legacy flat parser (post-processing the comma-list approach), but at the cost of either quadratic query expansion or a permanently ceiling-capped expressivity. The rewrite bought the structural capability — worth naming so the takeaway isn't "always rewrite", it's "pick up the AST when the syntax has to be recursive".

Systems surfaced

Concepts surfaced

  • concepts/abstract-syntax-tree — tree IR produced by the parse stage; prerequisite for recursive query generation.
  • concepts/peg-grammar — Parsing Expression Grammar, the grammar class parslet implements; natural fit for operator- precedence query languages.
  • concepts/boolean-query-dsl — nested must/should/should_not shape that Elasticsearch exposes and that the AST compiles into.
  • concepts/backward-compatibility — bookmarked URLs as the specific failure mode the grammar must avoid breaking; "existing tests pass with flag on and off" as the gate.
  • concepts/query-shape — flat vs nested queries as structurally different load profiles at the backend; the motivation for performance comparison even when the feature set is additive.
  • concepts/blast-radius — bounded here by API surface + traffic percent, composed.

Patterns surfaced

Last updated · 200 distilled / 1,178 read