Skip to content

Cloudflare — Making Rust Workers reliable: panic and abort recovery in wasm-bindgen

Summary

Cloudflare describes how Rust Workers — Rust code compiled to WebAssembly and run inside Cloudflare Workers via wasm-bindgen — historically poisoned their own sandbox on failure: a Rust panic or out-of-memory abort left the Wasm instance in an undefined state, so a single failing request could cascade into sibling requests and even new incoming requests returning 5xx (sandbox poisoning). The fix arrived in two upstream contributions to the wasm-bindgen project Cloudflare now co-maintains: (1) panic=unwind support for wasm32-unknown-unknown, implemented on top of the WebAssembly Exception Handling proposal, so a Rust panic unwinds cleanly across the Rust↔JS boundary and destructors run — the Wasm instance remains valid and reusable; and (2) a foreign-exception tag + abort hook (Exception.Tag, set_on_abort) letting embeddings distinguish recoverable unwinds from genuine aborts and attach an abort recovery handler as a last line of defence. As a side benefit, Cloudflare shipped an experimental --reset-state-function so JS apps that embed a wasm-bindgen-compiled library can request a fresh instance without re-importing. Cloudflare also backported modern WebAssembly Exception Handling to Node.js 22 and 24 so the ecosystem can pick up the modern variant before legacy support sunsets. Rust Workers 0.8.0 ships the feature under --panic-unwind; Cloudflare plans to make it the default in a subsequent release.

Key takeaways

  1. A single Rust panic could poison an entire Worker. Rust compiled to wasm32-unknown-unknown defaults to panic=abort: a panic traps with the unreachable instruction and exits Wasm back to JS with a WebAssembly.RuntimeError. There is no mechanism in stock Wasm + stock wasm-bindgen to return the instance to a known-good state — "an unhandled Rust abort in a Worker affecting one request might escalate into a broader failure affecting sibling requests or even continue to affect new incoming requests." Canonical wiki instance of concepts/sandbox-poisoning at the Wasm-instance tier. (Source: this post)

  2. Initial mitigation was a platform-side workaround, not an ecosystem fix. Cloudflare shipped a Proxy-based entrypoint encapsulation in workers-rs 0.6: JavaScript- side Proxy indirection wrapping every Rust→JS and JS→Rust call, a custom Rust panic handler tracking per-Worker failure state, and targeted modifications to wasm-bindgen's generated bindings to reinitialize the WebAssembly module before handling subsequent requests. "This approach relied on custom JavaScript logic" — the fix lived inside workers-rs rather than in wasm-bindgen itself, meaning every other wasm-bindgen consumer stayed broken. The rest of the post is the story of moving the fix upstream.

  3. panic=unwind for wasm32-unknown-unknown required the WebAssembly Exception Handling proposal. Native Rust unwinds panics across frames so destructors run and state is preserved; on Wasm, there was no unwinding primitive until the WebAssembly Exception Handling proposal gained wide engine support in 2023 (concepts/webassembly-exception-handling). With RUSTFLAGS='-Cpanic=unwind' cargo build -Zbuild-std (rebuilds the standard library), Rust now compiles panics into try / catch_all Wasm blocks that invoke destructors and rethrow — preserving native concepts/stack-unwinding semantics on Wasm.

  4. Wasm-bindgen toolchain had to be taught about exception handling. The WebAssembly parser Walrus did not understand try / catch instructions; Cloudflare added them. The descriptor interpreter had to evaluate code containing exception-handling blocks. Exports had to be marked extern "C-unwind" because extern "C" functions abort on foreign unwind. Panics in exported Rust functions now surface to JavaScript as PanicError exceptions; async Rust exports reject the returned JS Promise with a PanicError. After this work, "panics in exported Rust functions are caught by wasm-bindgen, Rust destructors run correctly, and the WebAssembly instance remains valid and reusable." Canonical wiki instance of concepts/panic-unwind on Wasm.

  5. Unwind safety for closures was the thorniest corner. Cloudflare added a MaybeUnwindSafe trait that checks std::panic::UnwindSafe only when built with panic=unwind, plus Closure::new_aborting variants that abort on panic rather than unwind in cases where safety can't be guaranteed — "to avoid a situation where users are encouraged to incorrectly wrap closures in AssertUnwindSafe." The fallback preserves the discipline that unsafe-to-unwind code terminates rather than silently violating invariants.

  6. Aborts still exist — so distinguish them with exception tags. panic=unwind covers panics, but genuine aborts (out-of-memory, deeply corrupted state) cannot unwind. "Aborts can take many shapes in WebAssembly." To tell a recoverable unwind from a real abort at the JS boundary, Cloudflare introduced an Exception.Tag for foreign exceptions — "mark all errors which are definitely unwinds" — and a set_on_abort hook letting embeddings attach recovery logic when a genuine abort occurs. Canonical wiki instance of concepts/abort-recovery. Reinitialising the instance is "an absolute worst-case scenario," but it is the necessary last line of defence to stop a single failure cascading: "the invalid state does not persist, ensuring a single failure does not cascade into multiple failures."

  7. Automatic abort recovery for wasm-bindgen libraries used from JS — not just Rust Workers. Cloudflare added an experimental --reset-state-function to wasm-bindgen that exposes a function allowing the Rust application to request that wasm-bindgen reset its internal Wasm instance back to initial state for the next call without requiring consumers to re-import or recreate the bindings. Class instances from the old Wasm instance throw as handles become orphaned; new classes work. "The JS application using a Wasm library is errored but not bricked." This benefits Cloudflare's JS-based Workers users who depend on Rust-backed Wasm libraries, even though the primary use case is outside Rust Workers.

  8. WebAssembly Exception Handling has two variants; the modern one needed a Node.js backport to ship in the default runtime. Rust's Wasm targets still default to the legacy exception-handling variant; the modern "with exnref" variant is the intended successor but only has engine support from early 2025 onward. Node.js 24 LTS was the constraint — "the Node.js 24 LTS release schedule would have left the entire ecosystem stuck on legacy WebAssembly Exception Handling until April 2028." Cloudflare backported modern EH to both Node.js 22 and Node.js 24 to ensure a default target. Compatibility disclosed: V8 13.8.1 (Apr 2025), workerd v1.20250620.0 (Jun 2025), Chrome 138 (Jun 2025), Firefox 131 (Oct 2024), Safari 18.4 (Mar 2025), Node.js 25.0.0 (Oct 2025).

  9. Rust Workers 0.8.0 ships the feature as --panic-unwind; it will become the default. Today, adding --panic-unwind to the Rust Workers build command enables full panic recovery + the new abort- classification + recovery hook. Users remaining on panic=abort still get the earlier custom recovery wrapper (workers-rs 0.6+). Plan: "make panic=unwind the default in a subsequent release."

Systems / concepts / patterns introduced

Systems:

Concepts:

Patterns:

  • patterns/proxy-based-entrypoint-encapsulation — wrapping an FFI boundary in a JS Proxy so every entry point is consistently intercepted for error handling / reinit (workers-rs 0.6 pre-upstream recipe).
  • patterns/reinitialize-on-unrecoverable-error — on detecting an abort, drop the invalidated sandbox state and re-create it from initial state so future operations proceed; canonical Wasm-instance-tier instance.
  • patterns/upstream-the-fix — seventh wiki instance (after V8/Node.js/OpenNext, Web Streams, Go arm64 codegen, Datadog Go binary size, fly.io parking_lot): Cloudflare moving the fix from workers-rs 0.6's custom wrapper into wasm-bindgen proper, plus the Node.js 22/24 EH backport.

Extends

  • systems/cloudflare-workers — Rust Workers are a first-class Workers sub-tier; panic-recovery semantics now come built-in.
  • systems/workerd — the open-source runtime whose v1.20250620.0 (Jun 2025) shipped modern Wasm EH.
  • systems/v8-javascript-engine — V8 13.8.1 shipped modern Wasm EH; workerd + Node + Chrome + Deno all benefit.
  • systems/nodejs — Cloudflare's modern-EH backport to Node 22 and Node 24 is the enabling step for the ecosystem's default to move off legacy EH.
  • concepts/stack-unwinding — previously a Go-runtime-on-arm64 wiki instance; now extended with a Rust-on-Wasm instance that leverages the same unwinder discipline but on a different substrate.
  • patterns/upstream-the-fix — new seen-in.
  • companies/cloudflare — new 2026-04-22 axis + recent article.

Caveats

  • Still experimental nightly Rust. Building for Wasm with panic=unwind requires a nightly target + rebuilt standard library (-Zbuild-std). Stable Rust hasn't adopted it; Cloudflare is "working to advance Rust's Wasm support for WebAssembly Exception Handling to help bring this to stable Rust", but timeline not committed.
  • Legacy vs modern EH transition still unfinished. Rust targets default to legacy EH; modern EH is now supported everywhere that matters but won't be the default until "next year" per the post. Interim adopters need to track both variants.
  • Abort still reinitialises the Worker. The post is careful: panic=unwind recovers without losing state; abort recovery reinitialises the entire application. Stateful workloads (Durable Objects is named specifically) keep their state across panics but lose it on aborts. Out-of-memory aborts remain a real risk class.
  • Closure::new_aborting is a discipline, not a guarantee: it terminates rather than unwinds when the captured state can't be proven unwind-safe. This preserves correctness but means some failure modes end in instance reinit rather than panic recovery.
  • No production incidence numbers disclosed. Post describes the failure class qualitatively ("a small chance that a Rust Worker would unexpectedly fail and cause other requests to fail along with it") but publishes no fleet-wide rate, no before/after numbers, no success rate of the new recovery path.
  • --reset-state-function is marked experimental. The wasm-bindgen library abort-recovery feature is available but not the default; its exact semantics (what happens to orphaned class instances, GC interaction, memory reclaim timing) are not covered in detail.

Source

Last updated · 510 distilled / 1,221 read