Skip to content

CONCEPT Cited by 1 source

Go compiler optimization gap

The Go compiler optimization gap is the practical engineering constraint that Go's compiler optimises substantially less aggressively than modern C/C++/Rust compilers (LLVM, GCC). Go's authors have historically chosen fast compile times over optimised output, and this choice shapes which performance techniques work in Go and which don't.

From Vicent Martí's Vitess evalengine post:

"There's always a trade-off between optimization and fast compile times, and the Go authors have historically opted for the latter." (Source: sources/2025-04-05-planetscale-faster-interpreters-in-go-catching-up-with-cpp)

Concrete symptoms relevant to interpreter design

1. Switch jump tables are unreliable

Classic VM interpreters use a giant switch statement over opcodes, expecting the compiler to emit a jump table. In Go:

  • Jump-table optimization for switch was implemented surprisingly late in the compiler.
  • Even now, the optimization is "very fiddly, without any way to enforce it."
  • For many switches the compiler falls back to binary search across cases — every opcode dispatch costs O(log N) compare-and-branches instead of one indirect jump.
  • There is no reliable way to verify from source code that your switch dispatches via jump table; you have to inspect the generated assembly.

See concepts/jump-table-vs-binary-search-dispatch.

2. Tail calls are not guaranteed

LLVM supports __attribute__((musttail)) to force tail-call conversion; this is what Python 3.14's interpreter depends on for its 30% speedup. In Go:

  • The compiler can emit tail calls in some cases but won't commit to it.
  • There is no musttail equivalent.
  • A tail-call-style interpreter in Go grows the stack on every opcode dispatch — unusable for programs with deep execution.

3. Large functions get poor register allocation

Classic VM dispatch loops are massive functions with hundreds of cases. The Go compiler:

  • Spills registers aggressively on every branch in large functions.
  • Can't reliably identify hot vs cold branches in a single giant function — hot cases get penalised alongside cold ones.
  • LuaJIT's Mike Pall noted this problem in C compilers too, but Go makes it "much worse."

4. Inlining is conservative

Go's inliner is less aggressive than LLVM's. Small helper functions that would be inlined in C++ often become real function calls in Go — with the call overhead included per invocation.

What this constraint implies

Performance-critical Go code often has to use Go's strengths instead of fighting its weaknesses:

  • Closures work well. The compiler has a clean calling convention for closures; capture is efficient. Martí leverages this heavily in the callback-slice interpreter design.
  • Small functions get good codegen. Split hot code into small functions so each one optimises cleanly; let the VM dispatch between them rather than concentrating everything in one switch.
  • Avoid giant switches in hot paths. Prefer method- dispatch or function-pointer arrays.
  • Sometimes reach for assembly. Go's ASM files support architecture-specific hand-tuned code when the compiler can't deliver — used by Go's standard library for crypto, BLAS primitives, SIMD.

Why the trade-off exists

Go was designed for a specific engineering reality: huge monorepos at Google where build times dominated developer productivity. A compiler that takes 30 seconds to build the world is better than one that takes 30 minutes, even at the cost of 10–20% runtime performance.

The trade-off has endured because:

  • Most Go services are I/O-bound; the CPU overhead doesn't matter.
  • Deployment at scale makes horizontal scaling cheap relative to compiler investment.
  • Go's sweet spot (infrastructure, networking, dev tooling) doesn't usually push compute-bound workloads.

Where Go does hit compute-bound hot paths — interpreters, encoding/decoding, cryptography, simulation — the gap becomes visible and engineering goes into design patterns that sidestep rather than fix it.

Seen in

  • sources/2025-04-05-planetscale-faster-interpreters-in-go-catching-up-with-cpp — canonical wiki analysis of the gap from Martí's interpreter-engineering perspective. Every major design decision in the Vitess evalengine VM rewrite responds to a specific Go compiler limitation: unreliable switch jump tables drove the rejection of the big-switch VM design; unreliable tail calls ruled out continuation-style loops; poor register allocation in large functions drove the split into many small closures.
Last updated · 319 distilled / 1,201 read