Skip to content

CONCEPT Cited by 1 source

Jump table vs binary search dispatch

When a compiler lowers a switch statement with many integer cases, it has two main strategies:

  1. Jump table — a dense array of code-addresses indexed by the switch value; one indirect jump goes to the right case.
  2. Binary search — a balanced tree of compare-and-branch instructions that narrows to the right case in O(log N) comparisons.

Which one the compiler picks makes a large difference to the performance of bytecode VMs and other dispatch-heavy code.

Jump table

target = jumptable[switch_value]
jmp target
  • Fixed cost: one indirect jump.
  • Needs contiguous (or nearly contiguous) case values — gaps waste table entries.
  • Branch predictor has one indirect-jump site with per-target predictions; hot cases are predicted accurately once warm.
  • Table lookup is a data-cache load; in tight loops the table stays in L1.

This is the fast path. C/C++ compilers reliably emit jump tables for switches with dense integer cases.

if (v < 32) {
    if (v < 16) { ... }
    else        { ... }
} else {
    if (v < 48) { ... }
    else        { ... }
}
  • Cost: log₂(N) compare-and-branches per dispatch.
  • Handles sparse case values, unusual ranges, non-contiguous opcodes.
  • Branch predictor has to predict every conditional in the tree; for a randomly-dispatched workload, some branches are harder to predict than a single indirect jump.
  • No cache lookup — instructions all live in icache.

When each is chosen

Compilers typically switch strategies based on case density and count. Some signals:

  • Dense contiguous cases (e.g. opcodes 0–255 with no gaps) → jump table.
  • Sparse or clustered cases → binary search, or hybrid.
  • Small case count (say < 4) → linear compare chain.
  • Unusual case values (big ints, mixed sign) → binary search by default.

Why it matters for interpreters

A big-switch VM dispatch loop is called once per bytecode instruction. If the loop compiles to a jump table, dispatch cost is one indirect jump — comparable to a tail-call interpreter. If it compiles to binary search over 200+ opcodes, dispatch cost balloons to ~7–8 conditional branches per opcode — substantially slower and harder to predict.

For dispatch-heavy workloads like interpreters, reliably getting a jump table matters.

The Go-specific problem

In Go specifically, switch jump-table optimization:

  • Was implemented surprisingly late in the compiler.
  • Is "very fiddly, without any way to enforce it."
  • Often falls through to binary search even on switches that look textbook-suitable for jump tables.
  • Cannot be verified from source code — only by reading generated assembly.

Vicent Martí (author of Vitess's evalengine VM rewrite):

"One key issue for Go is that often the different branches of the switch statement are jumped to via binary search instead of a jump table. … You have to tweak the way the VM's instructions are encoded carefully to ensure that you're jumping in the VM's main loop, and you have no way to reliably check whether your virtual machine's dispatch code has been properly optimized besides reviewing the generated assembly yourself." (Source: sources/2025-04-05-planetscale-faster-interpreters-in-go-catching-up-with-cpp)

This is one of the Go compiler optimization gaps that drove Martí to reject the big-switch VM design in favour of the callback-slice interpreter — where dispatch is a slice indirection, not a switch.

Verifying dispatch strategy

The only reliable way in Go:

go tool objdump -S -s '<function>' <binary>

Look for JMP AX with a preceding table-address load (jump table) vs a sequence of CMP / JLE / JGE pairs (binary search).

Seen in

Last updated · 319 distilled / 1,201 read