Skip to content

PATTERN Cited by 1 source

Device blocklist from telemetry

A device blocklist from telemetry rolls out a new backend/feature/API to client devices by gathering compatibility signals from live sessions, aggregating per-device-class, and blocklisting classes with high failure rates from future sessions. The blocklist becomes the gating mechanism for continued rollout rather than a binary "ship vs. don't ship" decision.

When to reach for it

  • Rolling out a new client API or backend that has known but hard-to-enumerate compatibility issues (e.g. GPU driver bugs for a new graphics API, browser-engine quirks for a new JS feature, OS-specific behaviors for a new system call).
  • The device space is too large to pre-test exhaustively — different GPUs × OS versions × driver versions × browser versions are combinatorial.
  • You have dynamic fallback or another recovery mechanism so failures don't crash the session — they become data.

Shape

session starts on new backend
    run compatibility probe (or just use normally)
         ├── success → continue
         └── failure / fallback triggered
             emit telemetry event
         aggregator: per-device-class failure rates
         threshold breached → add class to blocklist
next session on that device-class → start on old backend directly

Key design choices

  1. Device-class granularity — aggregate by GPU model × driver version × OS × browser. Too coarse → over-blocking; too fine → not enough data per class to trigger.
  2. Threshold calibrationfallback-rate threshold above which a class is blocklisted. Depends on user tolerance for the mid-session hitch. Figma's case: "falling back from WebGPU to WebGL can cause a hitch, which we'd like to avoid."
  3. Non-load-blocking probes — if the compatibility check itself is expensive (like async-readback on WebGPU), run it post-session-start so it doesn't regress startup latency. See concepts/synchronous-vs-asynchronous-readback.
  4. Blocklist distribution — push the blocklist to clients on startup (small config); clients decide locally whether to attempt the new backend.
  5. Rollback path — if driver / browser / OS updates fix a class's issues, un-blocklist. Requires ongoing monitoring and mechanism to revisit the blocklist.
  6. Initial blocklist — pre-populate with classes known to fail from pre-production testing; the telemetry loop then expands the list as rollout scales.

Why it beats "feature flag everything"

A global feature flag gives one boolean per user — on or off. It can't express "WebGPU works for this user except on their external display with a different GPU" or "works until the user switches Wi-Fi and the driver gets reinitialized."

The per-device-class blocklist expresses the compatibility matrix more faithfully. Combined with patterns/runtime-backend-swap-on-failure, even blocklist misses become data rather than crashes.

Canonical instance

Figma's WebGPU rollout. Figma uses two overlapping mechanisms:

  • Initial blocklist from pre-production compatibility tests (pixel-readback probes against known-good outputs).
  • Ongoing blocklist expansion from fallback-rate telemetry — after the dynamic-fallback system was in place, rollout resumed gating on average-fallback-rate per device class.

"Using this approach, we were finally able to complete the rollout."

(Source: sources/2026-04-21-figma-rendering-powered-by-webgpu)

Sibling patterns

Seen in

Last updated · 200 distilled / 1,178 read