CONCEPT

Instruction dispatch cost¶

Instruction dispatch cost is the per-instruction overhead a bytecode or VM interpreter pays to fetch the next opcode, decide which handler to run, and jump to it — before executing any useful work for that instruction.

Dispatch cost is the primary thing JIT compilation eliminates. It's also the primary thing that makes one interpreter design faster than another.

What makes up dispatch cost¶

For a classic big-switch VM:

while (ip < len) {
    switch (code[ip].opcode) {   // ← dispatch: fetch + branch
        case OP_ADD:
            ...                   // ← actual work
            break;
    }
    ip++;
}

The dispatch step includes:

Opcode fetch — load from instruction stream into register.
Bounds check + loop overhead on the while.
Switch branch — indirect jump (jump table) or cascaded compare-and-branch (binary search).
Return to dispatch loop — after the case executes, control flows back through the break + loop header for the next iteration.
Branch predictor pressure. The switch's indirect jump tends to cluster on one target per workload ("sticky branch"), which helps, but type-mixed workloads can thrash the BTB.

How dispatch cost varies by design¶

Design	Dispatch cost	Notes
AST interpreter	Very high	Recursive function call per node; register spillage; type dispatch inside each node
Big-switch VM (C/C++)	Low	Jump table from compiler; usually near-optimal
Big-switch VM (Go)	Medium-high	Switch often compiles to binary-search; see concepts/jump-table-vs-binary-search-dispatch
Tail-call interpreter	Very low	`musttail` makes dispatch a single indirect jump; Python 3.14 reports ~30% improvement
Callback-slice interpreter (Go)	Low-medium	One indirect call per opcode; no switch; closure captures state
JIT native code	~zero	Straight-line machine code with no dispatch

The canonicalises a rule of thumb:

"JIT compilers are important for programming languages where their bytecode operations can be optimized into a very low level of abstraction (e.g. where an 'add' operator only has to perform a native x64 ADD). In these cases, the overhead of dispatching instructions becomes so dominant that replacing the VM's loop with a block of JITted code makes a significant performance difference. However, for SQL expressions, and even after our specialization pass, most of the operations remain extremely high level … The overhead of instruction dispatch, as measured in our benchmarks, is less than 20%."

Decision rule:

Dispatch share >30% of runtime → JIT is justified. The VM can never catch up to native code while dispatch dominates.
Dispatch share <20% → stay in the VM. JIT adds substantial engineering cost (code generation, relocation, invalidation, security surface, multi-arch) that won't be repaid.

How to measure dispatch cost¶

Build an alternate implementation of a hot opcode that does no work (returns immediately) and measure the slowdown vs a no-op VM loop — that's dispatch cost.
Use perf stat counters (branches, branch-misses, iTLB-load-misses) to characterise how well the dispatch loop plays with the CPU frontend.
Compare median bytecode instruction size (in native instructions the interpreter issues per opcode) against the opcode body size.

Consequences on language design¶

Coarser opcodes amortise dispatch cost. If an opcode does substantial work (e.g. "match a JSON path", "format a decimal"), dispatch is a small tax. If an opcode does trivial work (e.g. "add two 32-bit ints"), dispatch dominates and JIT becomes the only way forward.
Stack VMs vs register VMs. Register VMs typically have fewer opcodes per program (2x–5x reduction) because each opcode can reach into operand memory directly, amortising dispatch. Most modern JIT-targeting VMs (Dalvik, LuaJIT) are register-based for this reason.

Seen in¶

— canonical benchmark of dispatch cost as a share of VM runtime. Vitess's measurement of <20% dispatch share drives the explicit rejection of JIT for SQL expression evaluation.