CONCEPT Cited by 1 source
Query simplifier¶
Definition¶
A query simplifier is a tool that takes a SQL query that exhibits a specific error (wrong result, crash, planner panic, resource exhaustion) and automatically produces the smallest equivalent query that still exhibits the same error. It is the SQL-specific application of delta debugging — Andreas Zeller's technique for minimising failing test cases by iteratively removing or modifying inputs and checking if the failure persists.
The canonical Vitess implementation is Andrés Taylor's automatic query simplifier (2022), designed to convert the noisy fuzzer-produced queries into minimal reproducers suitable for filing as bugs or adding to the regression test suite.
Mechanism¶
The simplifier operates on the query's AST:
- Input: an AST for a query
Qthat produces errorEwhen executed against the target system. - Outer loop: for each node
nin the AST (bottom-up or top-down, doesn't matter much), attempt one of: - Remove
n(and its subtree) if the parent node type allows it (e.g. drop aJOINclause, drop a predicate fromWHERE, drop a column from theSELECTlist). - Replace
nwith a simpler equivalent (e.g. a complex expression with a literal, a subquery with a table reference). - Inner check: run the modified query
Q'against the target system. IfQ'still produces errorE, recurse onQ'. If not, revert and try the next candidate. - Terminate when no further reduction preserves the error — this is the 1-minimal reproducer in the delta-debugging sense.
Per Murty (2023 summer internship post):
"The query simplifier is a tool used to automatically simplify queries that produce errors. It uses a brute-force approach, removing or modifying nodes in the AST and checking if the new, simpler query still exhibits the same error. If it does, the simplifier is called on the new query." (Source: sources/2026-04-21-planetscale-summer-2023-fuzzing-vitess-at-planetscale)
Why it's load-bearing¶
A random SQL fuzzer emits queries with dozens of joins, nested subqueries, and incidental clauses — none of which are load-bearing on the bug. A 200-line fuzzer-output query that reproduces a planner bug is unactionable: the engineer has to bisect manually to find the minimal shape, which is slow and error-prone.
The simplifier closes the fuzzer's feedback loop. Without it, fuzzers produce noisy signal engineers can't use. With it, every fuzzer finding becomes a minimal repro suitable for adding to the regression suite.
VSchema threading for end-to-end tests¶
Taylor's original simplifier targeted unit-test-level queries — queries that don't depend on sharding, routing, or vindex metadata. Murty's 2023 contribution extended the simplifier for end-to-end tests: the minimal query must still execute against the fuzzer's known Vitess cluster, which means the simplifier must preserve the AST nodes that reference sharded-table columns and vindex keys, and must have access to the VSchema when deciding whether a given reduction is valid.
See vitessio/vitess #13636 for the VSchema-aware simplifier refactor.
Relationship to delta debugging¶
Classical delta debugging operates on unstructured inputs (bytes of a file, lines of a source program). Query simplifiers apply the same idea to structured ASTs — respecting the grammar so every intermediate state is a valid parseable query. The grammar awareness is what lets the simplifier avoid the exponential blow-up of naive byte-level reduction.
Classical delta debugging is O(log n) in the input size for well-behaved inputs; AST-aware reduction is similar but with smaller constants because every candidate is grammatically well-formed and the inner check is cheap (single query execution).
Seen in¶
- sources/2026-04-21-planetscale-summer-2023-fuzzing-vitess-at-planetscale — first canonical wiki disclosure of Vitess's AST-based query simplifier. Arvind Murty's summer 2023 internship retrospective names the simplifier's mechanism verbatim ("removing or modifying nodes in the AST and checking if the new, simpler query still exhibits the same error") and links to Andrés Taylor's 2022 blog post (systay.github.io) as the canonical prior-art reference. Murty's contribution was extending the simplifier to work against end-to-end Vitess tests by threading VSchema information through — originally designed for unit tests with known-schema fixtures, the simplifier didn't know how to preserve sharding-relevant AST nodes when minimising. Shipped as vitessio/vitess #13636.