PATTERN Cited by 1 source
Fuzz AST-vs-VM oracle¶
Problem¶
You've rewritten an interpreter's execution path — AST interpreter → bytecode VM, or VM → JIT — to win performance. The new path is faster but you now have two implementations of the same language semantics. How do you keep them consistent across millions of SQL quirks, collation edge cases, overflow rules, and implicit casts?
Manual test-suite maintenance is insufficient — the surface area is too large. You need continuous, automated correctness checking at scale.
Solution¶
Use the two implementations as each other's oracles. Run a fuzzer that:
- Generates random expressions in the source language.
- Evaluates each expression with both interpreters (old AST + new VM).
- Compares outputs. Any disagreement is a bug in one of them.
Optionally, add a third reference implementation — e.g. the upstream language's canonical runtime — and compare all three. Discrepancies where the new and old agree but the reference disagrees are bugs in the reference.
Canonical instance: Vitess evalengine¶
The Vitess evalengine team runs this pattern in production:
"Our test suite and fuzzer are so comprehensive that we routinely find bugs in the original MySQL evaluation engine, which we have to fix upstream (like this collation bug, this issue in the
insertSQL function or this bug when searching substrings)." (Source: sources/2025-04-05-planetscale-faster-interpreters-in-go-catching-up-with-cpp)
After adding the VM:
"Lastly, when it comes to accuracy, being able to fuzz both the AST interpreter and the VM against each other has resulted in an invaluable tool for detecting bugs and corner cases."
Vitess fuzzes three implementations in practice:
- Old AST interpreter (Vitess, ground truth).
- New VM (Vitess, under test).
- Upstream MySQL C++ engine (external reference).
Three-way comparison lets them distinguish Vitess bugs from MySQL bugs. The PRs above are all cases where both Vitess interpreters agreed and MySQL disagreed — a bug in MySQL's own C++ evaluator that Vitess upstreams.
Why this pattern works¶
- Two independent implementations rarely have identical bugs. The AST interpreter and VM are written in different styles, by different engineers, with different control flow. A bug in one is unlikely to be a bug in the other.
- Fuzzing explores the long tail of input space. Manual test cases cover documented semantics; fuzzing finds the cases engineers didn't think of.
- The oracle is self-consistent. Unlike comparing against a human-written spec, the oracle is "what the other interpreter thinks" — which is concrete and automatable.
- Comparison is cheap. Both interpreters evaluate the same expression; comparison is an equality check on outputs.
Requirements¶
- Two (or more) independent implementations of the same language semantics.
- A generator for well-typed random expressions.
- A way to evaluate each expression with each implementation and compare outputs deterministically.
- Error-handling convention — usually "both error" or "both succeed with same result" is passing; "one errors, one succeeds" fails.
Related patterns¶
- patterns/parallel-rewrite-with-differential-testing — Meta's pattern for rewriting security-critical code with byte-identical output comparison. Same structural idea applied at the parser/decoder level.
- concepts/parser-differential — two parsers for the same format disagreeing; a related notion at the syntactic level.
- concepts/differential-fuzzing — the general technique.
- patterns/vm-ast-dual-interpreter-fallback — Vitess's architecture that keeps both interpreters alive; this pattern is what turns the retention from cost into asset.
Properties¶
| Property | Value |
|---|---|
| Correctness coverage | Very high (scales with fuzzer budget) |
| Bug discovery | Finds deep semantic bugs no unit test would catch |
| Implementation cost | Moderate — you were going to keep two anyway |
| Runtime cost | Continuous fuzzing cluster (Vitess CI) |
| Bug in reference implementation | Discoverable if you have 3 implementations |
When to use¶
- You have (or will have) two implementations of the same semantics. The pattern is almost free if you'd keep both anyway (e.g. for deoptimization fallback).
- Semantic correctness matters more than speed. Language runtimes, database engines, cryptographic libraries, security-critical parsers.
- You have budget for continuous fuzz infrastructure. The pattern delivers value proportional to fuzz iterations.
When not to use¶
- One implementation only. The pattern requires at least two independent oracles. With one, you need a different correctness strategy (formal spec, reference implementation).
- Non-deterministic semantics. If outputs can legitimately differ (random functions, wall-clock dependent operators), comparison becomes harder.
Caveats¶
- Requires well-typed generators. Random bytes won't produce well-typed SQL. Vitess's generator understands types and operator signatures.
- Both implementations can share a bug. If both import the same library for collation handling, a bug in the library is invisible to the oracle. Mitigation: a third independent reference (MySQL, in Vitess's case).
- Error-path semantics are tricky. If one interpreter errors and the other computes a value, the comparison logic must decide whether that's a bug or legal divergence.
- Continuous fuzzing is a recurring cost. Unlike a one-time audit, fuzzing infrastructure runs forever. Budget accordingly.
- Finding bugs in external references is a mixed blessing. Upstreaming MySQL bugs (Vitess's experience) is great for the ecosystem but requires patience — PRs take time, and you need workarounds in the meantime.
Seen in¶
- sources/2025-04-05-planetscale-faster-interpreters-in-go-catching-up-with-cpp
— canonical wiki instance. Vitess fuzzes its AST
interpreter, VM, and MySQL's C++ engine against each other;
the oracle has surfaced bugs in MySQL itself (collation,
insertfunction, substring search) that Vitess upstreams. First wiki canonicalisation of the pattern at the interpreter-semantics level.