PATTERN Cited by 1 source
Runtime backend swap on failure¶
A runtime backend swap on failure pattern treats mid-session backend failure as a first-class recovery event: when the current backend fails fatally, swap to a peer backend and continue the session, rather than tearing it down.
Distinct from process-level failover (which restarts the whole process on a new instance) and from compile-time backend selection (which picks one backend and commits).
When to reach for it¶
- You have two (or more) interchangeable backends behind a shared abstraction layer — e.g. WebGPU + WebGL, Metal + OpenGL, one LLM provider
- a peer, one storage tier + a peer.
- The preferred backend has known failure modes that survive initial compatibility checks — e.g. GPU device-lost, provider 503, driver-reset events that invalidate your live resources.
- Session teardown is expensive or disruptive — the user has unsaved state, a long-running task is in flight, user-facing hitch is preferable to a crash.
- You have fallback-rate telemetry so the swap-rate itself feeds into rollout gating.
The mechanism¶
session running on preferred backend
│
▼
fatal error from preferred
(device-lost, provider 503, driver-reset, ...)
│
▼
error classifier: retryable or fatal?
│
├── retryable → retry on preferred
│
└── fatal → swap to fallback backend
│
▼
re-hydrate session state on fallback
│
▼
continue session
│
▼
emit telemetry event so fallback-rate feeds
the blocklist for future sessions
Key implementation choices¶
- Reuse existing loss / reset handlers. Most platforms already have context-loss / device-lost / connection-reset handlers that re-create the same backend. Extend these to re-create a different backend — the plumbing for state rehydration already exists.
- State inventory. Whatever the session uploaded to the preferred backend (textures, buffers, sessions, caches) must be re-established on the fallback. Usually the same path as initial session-start.
- Error classification. Only fatal errors trigger swap. Retryable errors (network blips, transient GPU busy) should retry on the same backend.
- Idempotent swap. Rare but possible: the fallback could also fail. Your swap logic needs to handle this ("fallback of the fallback"? or session-teardown?).
- One-direction. Swapping is usually one-way. Upgrading back to the preferred mid-session is rarely worth the extra hitch — the next session starts fresh with a new preferred-attempt.
Difference from compile-time backend selection¶
Compile-time selection: pick one backend, branch deeply across the codebase. Can't switch mid-session. Can't handle backends that work initially but fail later.
Runtime swap: both backends are live in the same binary, both implement the abstraction layer, selector is a runtime variable that can change during a session.
Difference from process-level failover¶
Process failover: the whole process dies, a new process starts on a peer backend. Loses in-memory session state; user sees a full restart.
Runtime swap: same process, same session state, just a different backend underneath. User sees a hitch but the session continues.
Canonical instance¶
Figma's WebGPU → WebGL mid-session fallback.
Figma initially shipped static WebGPU compatibility tests at load
time. After rollout started, mid-session WebGPU failures were
observed on Windows — requestDevice / requestAdapter
throwing after device-lost events.
The dynamic-fallback system extends Figma's existing WebGL context- loss handler and WebGPU device-loss handler: "instead of re-creating the context/device using the same backend, we swap backends." Both backends implement Figma's graphics API interface layer, so the application's rendering code is unchanged by the swap.
Once dynamic fallback was in place, the rollout resumed — gated this time by a per-device fallback-rate blocklist (patterns/device-blocklist-from-telemetry) so devices with high swap rates are pre-empted from even attempting WebGPU on the next session.
(Source: sources/2026-04-21-figma-rendering-powered-by-webgpu)
Adjacent instances¶
- patterns/automatic-provider-failover — LLM provider
substrate,
opus-4-7 → opus-4-6-style failback chains. Same shape, different layer.
Seen in¶
- sources/2026-04-21-figma-rendering-powered-by-webgpu — Figma WebGPU → WebGL mid-session fallback using the context-loss handler shape.
Related¶
- concepts/dynamic-backend-fallback — the underlying concept.
- concepts/graphics-api-abstraction-layer — the substrate abstraction that makes swap tractable.
- patterns/automatic-provider-failover — sibling pattern on LLM substrate.
- patterns/device-blocklist-from-telemetry — telemetry loop that feeds back into rollout gating.
- patterns/graphics-api-interface-layer