Skip to content

CONCEPT Cited by 1 source

Tail-call continuation interpreter

A tail-call continuation interpreter is a bytecode-VM design where each opcode is a free-standing function; each function returns by tail-calling the function for the next opcode directly. The runtime never returns to a central dispatch loop — control flows directly between opcode functions via jumps.

Mechanism

int op_add(VM *vm) {
    // ... do work ...
    return DISPATCH(vm);    // ← tail-call the next opcode
}

int op_push(VM *vm) {
    // ... do work ...
    return DISPATCH(vm);
}

#define DISPATCH(vm) \
    __attribute__((musttail)) return opcodes[vm->code[vm->ip]](vm)

The compiler, seeing __attribute__((musttail)) (LLVM) or an equivalent directive, guarantees the call site is converted into a jump. The stack doesn't grow. Control flow looks like:

op_add ──jump──▶ op_push ──jump──▶ op_sub ──jump──▶ ...

The interpreter never returns to a central while loop — each opcode is responsible for picking and jumping to its successor.

Why this wins performance

Compared to a big-switch VM loop:

  • Each opcode is compiled in isolation. The compiler sees a small function and optimises registers locally — no register spillage caused by hundreds of sibling cases.
  • Branch prediction scales per-opcode. Each opcode's "what comes next" jump is separately predicted by the CPU's branch target buffer (BTB), so hot sequences get their own predictor slots. In a big switch, the one indirect jump shares a single BTB entry for all sequences.
  • Dispatch is a single indirect jump — minimal overhead.

Where it's shipping

  • Python 3.14 interpreter — the CPython 3.14 release ships a tail-call interpreter, reporting up to 30% improvement on typical Python code.
  • Protocol Buffers parser (Reverberate blog, 2021) — the canonical exposition of tail-call interpreters for parsing work.
  • LuaJIT's interpreter uses a hand-coded assembly variant of the same idea (Mike Pall's dispatch-optimised loop).

Why this does not work in Go

The Go compiler does not reliably emit tail calls. It can sometimes convert tail positions to jumps, but "it needs to be tickled in just the right way, and this implementation simply does not work in practice unless the tail-calls are guaranteed at compilation time." (Source: sources/2025-04-05-planetscale-faster-interpreters-in-go-catching-up-with-cpp)

Without a guarantee, the stack grows on every opcode dispatch — the interpreter recurses to unbounded depth on long programs and crashes with stack exhaustion.

Vitess's alternative design for Go is the callback-slice interpreter — the VM keeps a loop (so no tail calls needed) but the opcodes are free-standing closures.

Requirements

  • A compiler that supports guaranteed tail-call conversion. LLVM's musttail attribute is the canonical path; GCC has attribute-level support; MSVC does not.
  • A language where interprocedural tail calls to function pointers are legal with matching signatures.

Seen in

Last updated · 319 distilled / 1,201 read