CONCEPT Cited by 2 sources
Synchronous vs asynchronous GPU readback¶
Readback is copying pixel or buffer data back from the GPU to CPU-accessible memory. Whether readback is synchronous or asynchronous is a fundamental API-design choice that ripples through application architecture.
The two models¶
Synchronous readback — WebGL¶
gl.readPixels(0, 0, w, h, gl.RGBA, gl.UNSIGNED_BYTE, pixels);
// blocks until GPU flushes and CPU has bytes
- Blocks the calling thread until the GPU has produced and the CPU has received the bytes.
- Convenient for startup probes — render a tiny known scene, read pixels back, verify the driver gave correct results; do this in tens of milliseconds while the app is already blocking on startup.
- Costs CPU ↔ GPU synchronization, which is why issuing many sync readbacks per frame tanks performance.
Asynchronous readback — WebGPU¶
- Readback yields control to the event loop; the Promise resolves when the GPU-side copy completes.
- Doesn't block any other work; matches modern JS event-loop patterns.
- But latency until data is available can be hundreds of milliseconds in the worst case (GPU flushes, buffer mapping, event-loop scheduling) — a non-starter for startup-critical probes.
Architectural consequences¶
The sync→async shift breaks a specific workflow: load-time compatibility probes. A common pattern in WebGL applications was:
- On startup, render a small known scene.
- Read back pixels synchronously.
- Compare to expected output; if wrong, flag the device/driver.
- Apply WebGL workarounds before the user sees the app.
All of that happens before the session starts, gating on driver sanity. Rewriting that for async readback makes startup slower by hundreds of milliseconds — usually unacceptable.
The workaround: move the probe out of the critical path. Run it after the session has started, in a non-blocking task; when the probe completes, feed the result into a telemetry-driven device blocklist (patterns/device-blocklist-from-telemetry) that runs before the next startup decides whether to attempt the new API.
Why WebGPU went async¶
- Native graphics APIs (Vulkan, Metal, D3D12) already expose async-only readback.
- Modern JS runtimes are async-first; sync blocking is an anti-pattern.
- Driver-level sync readback hides substantial cost behind a blocking call — making it async makes the cost visible to the developer.
Seen in¶
- sources/2026-04-21-figma-rendering-powered-by-webgpu — the canonical wiki instance. Figma's startup-time compatibility probe was WebGL-synchronous; the WebGPU equivalent "could increase load times by hundreds of milliseconds, which wasn't acceptable". Figma moved compatibility probing to a non-load-blocking post-session task and fed the results into a device blocklist.
- sources/2026-05-19-aws-how-synthesia-optimizes-generative-ai-video-inference-on-amazon-ec2-g7e-instances
— same sync/async axis at GPU-compute-pipeline altitude
rather than graphics-API altitude. The CUDA equivalent of
WebGPU's
mapAsyncis the combination of a dedicated Copy Stream + pinned host buffers — both required for fully-async D2H. Synchronous CUDA D2H on the default stream stalls compute the same way synchronousgl.readPixelsblocks the JS event loop. The Synthesia / Wan 2.2 14B VAE-decoder benchmark on g7e.2xlarge shows the baseline cost: 18% of GPU wall-clock idle in the synchronous-D2H configuration, recovered to <0.1% by the patterns/asynchronous-frame-generation-pipeline. The graphics-API and CUDA-compute altitudes share both the problem (sync readback serialises work that could run in parallel) and the basic fix (split into separate async channels with explicit handoff barriers) — different APIs, same underlying hardware property.
Related¶
- systems/webgpu
- systems/webgl
- patterns/device-blocklist-from-telemetry
- concepts/device-to-host-transfer — same operation at CUDA-compute altitude.
- concepts/cuda-stream — async-D2H primitive on CUDA.
- concepts/pinned-memory — required co-primitive for fully-async CUDA D2H.
- patterns/asynchronous-frame-generation-pipeline — async readback at chunked-video-inference altitude.