PATTERN Cited by 1 source
Callback-slice VM in Go¶
Problem¶
You need to build a bytecode-VM-class fast interpreter for a dynamic expression language in Go. The mainstream designs (big-switch VM, tail-call continuation interpreter) don't translate well:
- A big-switch VM in Go is often compiled with binary-search dispatch instead of a jump table, with no reliable way to force the jump table. Register spillage in the giant switch function hurts further.
- Tail-call
continuation loops depend on guaranteed tail-call
optimization (LLVM
musttail); Go's compiler doesn't guarantee tail calls and the stack grows. - A JIT is not worth the complexity if instruction dispatch is under ~20% of runtime.
Solution¶
Emit each instruction as a Go closure pushed onto a
[]func(*VirtualMachine) int slice. The VM loop walks the
slice, invoking each callback in turn; each callback returns
an offset to advance the instruction pointer.
The pattern has two ingredients.
1. The VM is trivial¶
func (vm *VirtualMachine) execute(p *Program) (eval, error) {
code := p.code
ip := 0
for ip < len(code) {
ip += code[ip](vm)
if vm.err != nil {
return nil, vm.err
}
}
if vm.sp == 0 {
return nil, nil
}
return vm.stack[vm.sp-1], nil
}
One for-loop, one indirect call per opcode, one error check. That's it. No switch, no opcode decode, no case explosion.
2. The compiler emits closures, not bytecode¶
func (c *compiler) emitPushNull() {
c.emit(func(vm *VirtualMachine) int {
vm.stack[vm.sp] = nil
vm.sp++
return 1
})
}
func (c *compiler) emitPushColumn_text(offset int, col collations.TypedCollation) {
c.emit(func(vm *VirtualMachine) int {
vm.stack[vm.sp] = newEvalText(vm.row[offset].Raw(), col)
vm.sp++
return 1
})
}
Instruction arguments (offset, col) are captured in
closure state by the Go compiler. No encoding, no decoding,
no bytecode format to keep in sync with the VM.
Control flow¶
Each callback returns an int offset:
return 1— advance to the next instruction (sequential).return N— jump forward by N (forward branch).return -N— jump backward by N (loop).return 0with an error sentinel — halt or deoptimise.
Properties¶
| Property | Value |
|---|---|
| Dispatch cost | One indirect call per opcode |
| Runtime memory | Zero on most opcodes (see static specialization) |
| Compile-time memory | One closure allocation per instruction |
| VM ↔ compiler sync | None — there's no bytecode encoding |
| Instruction argument encoding | Free (closure capture) |
| Control flow | Integer offsets returned by callbacks |
| Maintenance cost | Low; each opcode is a self-contained function |
Canonical example: Vitess evalengine¶
Vicent Martí's 2025 Vitess
evalengine rewrite is the canonical wiki instance. The
VM is at
go/vt/vtgate/evalengine/vm.go;
the entire execution engine is "hardly more complicated than
this" (Source:
sources/2025-04-05-planetscale-faster-interpreters-in-go-catching-up-with-cpp).
Composed with:
- patterns/static-type-specialized-bytecode — every closure is a type-specialised opcode emitted by Vitess's semantic analyzer based on schema types.
- patterns/vm-ast-dual-interpreter-fallback — when a
specialized closure hits a value-dependent type promotion,
it sets
vm.err = errDeoptimizeand execution falls back to the AST interpreter.
Benchmark result: VM geomean −48.60% sec/op vs the original AST interpreter; faster than MySQL C++ on 4 of 5 benchmarks; zero memory allocations on 4 of 5 benchmarks.
When to use¶
- You're building a performance-critical interpreter in Go. C/C++/Rust have better compiler support for the alternatives.
- Instructions are high-level (each opcode does substantial
work). If opcodes are trivial (native
ADD), JIT becomes worthwhile. - The source language has strong static type information so you can compose with static specialization and avoid runtime type dispatch.
- Program size is moderate. Each instruction has a closure allocation; very large programs may stress compile-time memory.
When not to use¶
- Languages where types can only be observed at runtime. A dynamic-typing-heavy VM benefits more from quickening or JIT type speculation than from static specialization.
- Non-Go languages. C/C++/Rust should use tail-call continuation interpreters (musttail) for lower dispatch cost.
- VMs with extremely hot, trivial opcodes. JIT wins when dispatch is >~30% of runtime.
Caveats¶
- Closure-allocation count scales with program size. Each instruction is a closure object; a 10k-instruction query plan is 10k closures. Acceptable when compile-time cost is amortised across many executions; expensive for one-shot queries.
- Not debuggable as bytecode. You can't dump the program
to a
.pyc-like byte stream. Debugging means reading Go source + flamegraphs. - Ties implementation to Go. Porting a callback-slice VM to C would require a manual closure layout; to Python, different calling conventions. This is a Go-specific sweet spot.
Seen in¶
- sources/2025-04-05-planetscale-faster-interpreters-in-go-catching-up-with-cpp — canonical wiki instance. Vitess evalengine VM. Geomean −48.60% sec/op vs AST baseline, catches up with MySQL's C++ implementation on 4/5 benchmarks. First wiki instance of a production bytecode-less Go VM.
Related¶
- concepts/callback-slice-interpreter
- concepts/bytecode-virtual-machine
- concepts/tail-call-continuation-interpreter
- concepts/go-compiler-optimization-gap
- concepts/static-type-specialization
- concepts/instruction-dispatch-cost
- patterns/static-type-specialized-bytecode
- patterns/vm-ast-dual-interpreter-fallback
- systems/vitess-evalengine