PATTERN Cited by 1 source

Preemption-safe compiler emit¶

Intent¶

When a language runtime supports async preemption at any instruction boundary and the runtime's observers (GC, scheduler, signal handler) read runtime-observable state (stack pointer, frame pointer, pointer barriers, ...), the compiler must emit code that never leaves that state partially updated at any instruction boundary.

Concretely: if updating the state requires multiple opcodes because of ISA encoding limits, build the full new value in a scratch register first, then apply it to the runtime-observable target with a single indivisible register-form opcode.

The anti-pattern¶

Emitting a split immediate operation directly to the target:

; arm64 stack pointer adjustment, pre-go1.23.12
ADD $8,          RSP, R29
ADD $(16<<12),   R29, R29
ADD $16,         RSP, RSP
ADD $(16<<12),   RSP, RSP     ; race window starts after previous ADD
RET

Between the two RSP adjustments, RSP holds a value that is neither the old nor the new stack pointer. Any async preemption landing here leaves the runtime unable to unwind the stack.

The pattern¶

Build the full immediate in a scratch register, then apply with one indivisible opcode:

; arm64 stack pointer adjustment, go1.23.12+
LDP -8(RSP), (R29, R30)
MOVD $32,        R27
MOVK $(1<<16),   R27
ADD  R27, RSP, RSP            ; indivisible
RET

The MOVD + MOVK pair updates a scratch register (not runtime-observable). The single ADD R27, RSP, RSP applies the update to the observable target in one opcode. Preemption may land before or after, but not during.

What counts as "runtime-observable"¶

The runtime's safe-point analysis defines the surface. In Go:

Stack pointer (sp) — read during stack unwinding for GC scan, panic, traceback.
Frame pointer / saved link register — read during traceback to identify the calling function.
Heap pointers with write-barrier semantics — read by the GC's concurrent mark phase. The compiler's write-barrier sequence is required to be observable-atomic.
Go-routine's local g pointer register — used by the preemption handler to locate scheduler state.

Any compiler update to these must pass through a scratch register if the update cannot fit in a single opcode.

Why the assembler level is not sufficient¶

The Go pre-go1.23.12 architecture expressed the intent at the obj.Prog IR level as a single logical ADD $n, RSP, RSP and relied on the assembler (asm7.go's conclass) to split the immediate when necessary. The IR-level abstraction was leaky — downstream passes and runtime observers couldn't tell that what looked like one operation was actually two.

The fix promotes preemption-safety awareness to the compiler level. The compiler now emits code that is already decomposed through a scratch register, so the assembler has nothing to split. See systems/go-compiler (patch in cmd/internal/obj/arm64/obj7.go).

Generalisation¶

Any compiler targeting a runtime with async preemption must audit codegen for all cases where:

The target register is runtime-observable at preemption time.
The operation requires multiple opcodes due to ISA encoding limits, register pressure, or other reasons.

If both hold, use a scratch register + indivisible apply. If only one holds (e.g. the target isn't observable, or the operation fits in one opcode), the split is safe.

Seen in¶

sources/2025-10-08-cloudflare-we-found-a-bug-in-gos-arm64-compiler — canonical wiki instance. Go's arm64 backend pre-go1.23.12 emitted the anti-pattern for function-epilogue SP adjustments on frames > 4 KiB. Fix in go1.23.12 / go1.24.6 / go1.25.0 applies the pattern.

systems/go-compiler — where the fix landed (cmd/internal/obj/arm64/obj7.go).
systems/go-assembler — previously did the immediate splitting; no longer required for this case.
systems/arm64-isa — the architectural constraint.
concepts/split-instruction-race-window — the failure class this pattern prevents.
concepts/async-preemption-go — the runtime mechanism the pattern must cooperate with.
patterns/upstream-the-fix — the meta-pattern; the preemption-safe emit fix is ideally upstreamed into the toolchain rather than worked around in user code.