CONCEPT Cited by 1 source
Jump table vs binary search dispatch¶
When a compiler lowers a switch statement with many integer
cases, it has two main strategies:
- Jump table — a dense array of code-addresses indexed by the switch value; one indirect jump goes to the right case.
- Binary search — a balanced tree of compare-and-branch
instructions that narrows to the right case in
O(log N)comparisons.
Which one the compiler picks makes a large difference to the performance of bytecode VMs and other dispatch-heavy code.
Jump table¶
- Fixed cost: one indirect jump.
- Needs contiguous (or nearly contiguous) case values — gaps waste table entries.
- Branch predictor has one indirect-jump site with per-target predictions; hot cases are predicted accurately once warm.
- Table lookup is a data-cache load; in tight loops the table stays in L1.
This is the fast path. C/C++ compilers reliably emit jump tables for switches with dense integer cases.
Binary search¶
- Cost:
log₂(N)compare-and-branches per dispatch. - Handles sparse case values, unusual ranges, non-contiguous opcodes.
- Branch predictor has to predict every conditional in the tree; for a randomly-dispatched workload, some branches are harder to predict than a single indirect jump.
- No cache lookup — instructions all live in icache.
When each is chosen¶
Compilers typically switch strategies based on case density and count. Some signals:
- Dense contiguous cases (e.g. opcodes 0–255 with no gaps) → jump table.
- Sparse or clustered cases → binary search, or hybrid.
- Small case count (say < 4) → linear compare chain.
- Unusual case values (big ints, mixed sign) → binary search by default.
Why it matters for interpreters¶
A big-switch VM dispatch loop is called once per bytecode instruction. If the loop compiles to a jump table, dispatch cost is one indirect jump — comparable to a tail-call interpreter. If it compiles to binary search over 200+ opcodes, dispatch cost balloons to ~7–8 conditional branches per opcode — substantially slower and harder to predict.
For dispatch-heavy workloads like interpreters, reliably getting a jump table matters.
The Go-specific problem¶
In Go specifically, switch jump-table optimization:
- Was implemented surprisingly late in the compiler.
- Is "very fiddly, without any way to enforce it."
- Often falls through to binary search even on switches that look textbook-suitable for jump tables.
- Cannot be verified from source code — only by reading generated assembly.
Vicent Martí (author of Vitess's evalengine VM rewrite):
"One key issue for Go is that often the different branches of the switch statement are jumped to via binary search instead of a jump table. … You have to tweak the way the VM's instructions are encoded carefully to ensure that you're jumping in the VM's main loop, and you have no way to reliably check whether your virtual machine's dispatch code has been properly optimized besides reviewing the generated assembly yourself." (Source: sources/2025-04-05-planetscale-faster-interpreters-in-go-catching-up-with-cpp)
This is one of the Go compiler optimization gaps that drove Martí to reject the big-switch VM design in favour of the callback-slice interpreter — where dispatch is a slice indirection, not a switch.
Verifying dispatch strategy¶
The only reliable way in Go:
Look for JMP AX with a preceding table-address load (jump
table) vs a sequence of CMP / JLE / JGE pairs (binary
search).
Seen in¶
- sources/2025-04-05-planetscale-faster-interpreters-in-go-catching-up-with-cpp — canonical wiki description of the Go-specific reliability problem. Motivates Vitess's choice of callback-slice over big-switch VM design.