CONCEPT

Tail-call continuation interpreter¶

A tail-call continuation interpreter is a bytecode-VM design where each opcode is a free-standing function; each function returns by tail-calling the function for the next opcode directly. The runtime never returns to a central dispatch loop — control flows directly between opcode functions via jumps.

Mechanism¶

int op_add(VM *vm) {
    // ... do work ...
    return DISPATCH(vm);    // ← tail-call the next opcode
}

int op_push(VM *vm) {
    // ... do work ...
    return DISPATCH(vm);
}

#define DISPATCH(vm) \
    __attribute__((musttail)) return opcodes[vm->code[vm->ip]](vm)

The compiler, seeing __attribute__((musttail)) (LLVM) or an equivalent directive, guarantees the call site is converted into a jump. The stack doesn't grow. Control flow looks like:

op_add ──jump──▶ op_push ──jump──▶ op_sub ──jump──▶ ...

The interpreter never returns to a central while loop — each opcode is responsible for picking and jumping to its successor.

Why this wins performance¶

Compared to a big-switch VM loop:

Each opcode is compiled in isolation. The compiler sees a small function and optimises registers locally — no register spillage caused by hundreds of sibling cases.
Branch prediction scales per-opcode. Each opcode's "what comes next" jump is separately predicted by the CPU's branch target buffer (BTB), so hot sequences get their own predictor slots. In a big switch, the one indirect jump shares a single BTB entry for all sequences.
Dispatch is a single indirect jump — minimal overhead.

Where it's shipping¶

Python 3.14 interpreter — the CPython 3.14 release ships a tail-call interpreter, reporting up to 30% improvement on typical Python code.
Protocol Buffers parser (Reverberate blog, 2021) — the canonical exposition of tail-call interpreters for parsing work.
LuaJIT's interpreter uses a hand-coded assembly variant of the same idea (Mike Pall's dispatch-optimised loop).

Why this does not work in Go¶

The Go compiler does not reliably emit tail calls. It can sometimes convert tail positions to jumps, but "it needs to be tickled in just the right way, and this implementation simply does not work in practice unless the tail-calls are guaranteed at compilation time." (Source: )

Without a guarantee, the stack grows on every opcode dispatch — the interpreter recurses to unbounded depth on long programs and crashes with stack exhaustion.

Vitess's alternative design for Go is the callback-slice interpreter — the VM keeps a loop (so no tail calls needed) but the opcodes are free-standing closures.

Requirements¶

A compiler that supports guaranteed tail-call conversion. LLVM's musttail attribute is the canonical path; GCC has attribute-level support; MSVC does not.
A language where interprocedural tail calls to function pointers are legal with matching signatures.

Seen in¶

— canonical wiki survey of the technique. Martí discusses it as the C/C++/Python 3.14 state-of-the-art that does not transfer to Go because of Go's compiler optimization gap, motivating the callback-slice alternative.