Skip to content

CLOUDFLARE 2026-05-13 Tier 1

Read original ↗

Cloudflare — Browser Run: now running on Cloudflare Containers, it's faster and more scalable

One-paragraph summary

Cloudflare engineering post (2026-05-13) on the migration of Browser Run (rebranded from Browser Rendering on 2026-04-16) off shared Browser Isolation (BISO) infrastructure onto Cloudflare Containers. Three architectural layers disclosed: (1) regional pre-warmed pools of DO+Container pairs to constrain the user→DO and DO→Container RTT — DO-enabled Containers create a Durable Object near the user but the connected Container "may spin up on the other side of the world", which is fine for one-shot RPCs but adds dozens of milliseconds when a screenshot needs "establishing a WebSocket between them and exchanging dozens of messages". (2) Migration of container-state tracking from Workers KV to D1 + Queues: KV's eventual consistency (minimum cache TTL recently reduced from 60s to 30s) caused race conditions and overallocation"You might check KV, see a container as 'available,' but by the time you route to it (30 seconds later), it's already claimed." D1's transactional SQLite supports atomic browser claim via UPDATE ... WHERE sessionId IN (SELECT ... FROM candidate_pool ORDER BY RANDOM() LIMIT ?5) RETURNING data. To avoid the D1 per-row write throughput ceiling — at ~1 ms/write the bound was "1,000 writes/sec, which at one row per write would mean we could only have 5,000 containers before overloading the database" — each container's per-5s state update is enqueued to a Queues consumer with max_batch_size: 100, max_batch_timeout: 1, which gets D1's P95 batch-write latency to 0.1 ms and lifts the per-location ceiling to 500,000 containers, with ≤2-second lag at steady state. (3) Quick-action flow coalesced from a multi-message WebSocket protocol into a single HTTP request sent directly to the Container, executing the page-open / navigate / wait-for-load / screenshot flow internally without per-step worker↔browser round-trips. Headline operational outcomes: 60 browsers/min via the Workers binding (4× the previous limit), 120 concurrent, >50% reduction in Quick Action response times, WebGL and WebMCP unblocked because Browser Run now controls its own image and Chromium-version cadence independent of BISO. Migration ran as a gradual ramp via a Worker in the request path (Quick Actions → Workers binding free → PAYG → all contract customers); transition required no customer code change or worker redeploy. Framed as a Customer Zero deployment of DO-enabled Containers — Browser Run's friction with the "novel, unstable early-stage Containers platform interface that was light on documentation, light on observability, and light on colleagues in an overlapping timezone" drove substantial upgrades to the Containers platform itself, the kind that benefit external customers next.

Key takeaways

  1. The shared-tier-with-BISO arrangement compromised three different properties simultaneously. Verbatim from the post: "BISO's larger container images slowed startup and development. Crucially, BISO browsers lacked optimal global distribution, compromising resiliency and latency. Additionally, typical BISO users' long, steady sessions clashed with Browser Run's short, spiky usage, creating scaling bottlenecks and availability delays." Three failure axes: image-size mismatch (slowed cold start + dev iteration), distribution-mismatch (BISO's POP footprint ≠ Browser Run's optimal latency footprint), and workload-shape mismatch (long-steady BISO sessions ≠ short-spiky Browser Run sessions, with the resulting capacity planning being optimised for the wrong shape). Canonical wiki instance of the "shared infra is cheaper until your workload shape diverges from the shared shape" failure mode.
  2. DO-enabled Containers create the DO near the request, but the Container can spin up anywhere in the world. Verbatim: "DO-enabled Containers create a Durable Object as close to the incoming request as possible, but the connected Container may spin up on the other side of the world. This works fine for one-shot messages like 'start my app,' but when you're establishing a WebSocket between them and exchanging dozens of messages for a screenshot request, those extra milliseconds crossing the globe start adding up." Canonical wiki instance of concepts/do-to-container-cross-region-rtt — DO placement and Container placement are two independent decisions, and chatty workloads pay the worse of the two.
  3. Solution: regional pools of pre-warmed DO+Container pairs. Verbatim: "Create regional pools of pre-warmed DO-backed browser containers to constrain the max distance (and hence max latency) between DOs and containers. When a request comes in, we pick a DO-container pair closest to the user within that region. This keeps latency low on both hops: user to DO, and DO to container. It adds a few more moving parts to our overall architecture, but we figured that was worthwhile so long as we had observability into the global state of each browser so that we could allocate and re-allocate capacity according to changing demand." Canonical wiki instance of patterns/regional-pre-warmed-do-container-pair-pool — the pre-warmed-pool pattern at DO+Container-pair granularity, with region as the bounding scope.
  4. KV's eventual consistency was the bottleneck on the allocation hot path. Verbatim: "AI agent builders discovered Browser Run and quickly brought request volumes outpacing our existing capacity. We quickly hit the limits of how quickly we could adjust our pool capacity to serve this new demand with a scalable approach. KV's eventual consistency of around 30 seconds was becoming a bottleneck on our critical request path. You might check KV, see a container as 'available,' but by the time you route to it (30 seconds later), it's already claimed. That lag creates race conditions and overallocation of browsers, severely limiting how fast we could scale to meet demand spikes." The post links the recent KV minimum-cache-TTL reduction from 60s to 30s and explicitly notes "that value is still too high" — i.e. the recent improvement does not close the gap for this workload. Canonical wiki instance of concepts/eventual-consistency-too-slow-for-allocation.
  5. Migration to D1: SQLite transactions enable atomic browser claim. Verbatim: "D1's transactional nature is a good fit here. Once we assign a browser to a user, it's exclusively theirs. Browsers are not shared resources. SQLite transactions ensure atomic assignment and prevent race conditions where two requests might claim the same browser simultaneously." The simplified browser-acquisition query reproduced from the post:
    WITH candidate_pool AS (
        -- candidate pool logic to pick based on latency and other rules
    )
    UPDATE containers
    SET status = 'picked'
    WHERE sessionId IN (
        SELECT sessionId
        FROM candidate_pool
        ORDER BY RANDOM()
        LIMIT ?5
    )
    RETURNING data
    
    Three load-bearing properties: UPDATE...RETURNING is the atomic check-and-claim primitive (no SELECT-then-UPDATE race window); ORDER BY RANDOM() LIMIT ?5 load-balances claim targets across the candidate pool; status = 'picked' marks the browser as claimed in the same statement that returns it. Canonical wiki instance of concepts/sqlite-transaction-for-atomic-resource-claim.
  6. D1 shards are per-location, and the per-row-write ceiling forced a batching layer. Verbatim: "We keep D1 shards per location and given that we may have several thousand containers running, and that each container needs to update its state every 5 seconds, we kept running into a problem: we would overload the database. For instance, if each write takes 1ms we can only write at most 1,000 times, which at one row per write would mean that we could only have 5,000 containers before overloading the database. However, if we batch those writes, we can get much higher values, because batch writes are not significantly longer than individual ones, so we can increase the throughput in orders of magnitude. In our case, we use 100 row batches, which means we can now update a maximum of 500,000 containers per location." Headroom multiplier: 100×, from 5,000 to 500,000. Canonical wiki instance of patterns/queue-batching-amortizes-db-write-throughput.
  7. Implementation: Cloudflare Queues as the batching layer. Verbatim from the post (with location- suffixed queue name preserved):
    {
        "queues": {
            "consumers": [
                {
                    "queue": "production-core-containers-queue-weur",
                    "max_batch_size": 100,
                    "max_batch_timeout": 1,
                    "max_retries": 1
                }
            ]
        }
    }
    
    Container state update cadence: every 5 seconds. Consumer batch size: 100. Consumer batch timeout: 1 second. Steady-state lag: <2 seconds disclosed ("With this configuration, we achieve acceptable lag times well below 2 seconds"). Per-location queue naming convention (-weur = Western Europe) implies one queue per region — the same regional partitioning as the D1 shards.
  8. Backup-region fallback when the queue backs up. Verbatim: "That said, queue backlogs can still cause stale state. When this happens, each region falls back to a designated backup region until the primary queue catches up." Canonical wiki instance of patterns/region-fallback-on-queue-backlog — where allocation failover is gated not on the primary's health (it's still up) but on its control-plane lag; the data plane (running browsers) continues serving in the primary while the allocation control plane temporarily borrows a backup region's view of capacity.
  9. D1 batch-write P95 disclosed at 0.1 ms. Verbatim: "Currently, our P95 for batch write is 0.1ms!" Per-row write was the 1 ms baseline assumption, so a 100-row batch at 0.1 ms P95 is a ~1,000× per-row efficiency improvement at the same wall-clock cost — the D1 substrate's amortisation slope is the load-bearing reason this whole architecture works at the disclosed scale.
  10. Quick-action protocol coalesced from chatty WebSocket to single HTTP request. Verbatim: "Previously, our workers established a WebSocket to the remote browser and sent instructions one at a time: open a page, navigate to the URL, wait for it to load and take the screenshot. Each step had to be completed before the next could begin. However, now we send all parameters in a single HTTP request directly to the container, and the entire flow executes internally without any back-and-forth between the worker and browser." Canonical wiki instance of patterns/single-http-request-over-chatty-websocket — the proxy-side latency win from coalescing a deterministic multi-step protocol exchange into one request, when the receiver can execute the steps locally.
  11. Headline operational outcomes disclosed verbatim:
    • Workers binding limit: 60 browsers/min (start rate via the Workers binding).
    • Concurrent browsers: 120 (per binding) — "4x the previous limit".
    • Quick Action response time: more than 50% reduction (the post-migration drop is shown in a graph captioned "a sharp decrease in average quick-action response time").
    • All improvements live with no customer change required: "You don't need to change anything: these improvements are live today."
  12. Customer-Zero discipline drove platform upgrades. Verbatim: "Like most successful product platforms, we're committed to building on our own platform wherever feasible so that we can feel and fix any pain points ahead of any external customers." And: "On our end, though, we faced a fresh set of challenges getting familiar with a novel, unstable early-stage Containers platform interface that was light on documentation, light on observability, and light on colleagues in an overlapping timezone. However, our feedback to our own teams as Customer Zero meant that we could provide a tight feedback loop leading to substantial upgrades that benefit our external customers too." The post links to Cloudflare's Customer Zero framing as a self-named discipline. Canonical wiki instance of concepts/customer-zero — the deliberate first-party workload that proves a platform's edges before external customers do.
  13. Migration shape: gradual ramp via in-path Worker, no customer change required. Verbatim: "We started a gradual migration by inserting a Worker in our incoming request paths to provide some Container-powered browsers to a handful of users alongside those from BISO. This dual support during development was key: it allowed us to compare performance, isolate implementation bugs and ultimately gain confidence in the benefits of the Container-driven approach." Ramp order: Quick Actions endpoints → Workers-binding traffic on free accounts → PAYG accounts → all remaining contract customers. Cohort-graduated rollout where the smallest- blast-radius cohort (Quick Actions, then free) proves the migration before contract-customer traffic moves.
  14. Owning the image unblocks browser-feature shipping. Verbatim: "When our browsers ran on shared product infrastructure, upgrading Chrome meant coordinating across multiple teams and products, each with their own roadmap and priorities. However, now that we run our own container image, we can upgrade at a faster tempo. For example, WebGL, a much-requested feature, is now available for browser-based rendering along with WebMCP (Model Context Protocol for the web) which enables new agentic interaction patterns. Both are made possible because we can control the browser version and flags without unwanted side effects in other Cloudflare products." Two newly-shipped browser-tier capabilities (WebGL, WebMCP) named as direct payoffs of independent Chromium-version cadence — the kind of feature unblock that follows from getting off shared infra, even when the motivation for the migration was performance and scale.

Operational numbers

  • Browser start rate (Workers binding): 60 browsers/minute.
  • Concurrent browsers (Workers binding): 120 (post- migration; "4x the previous limit").
  • Quick Action response time: >50% reduction (graph shown in post; absolute numbers not disclosed).
  • Container state update cadence: 5 seconds.
  • D1 per-row write latency baseline: ~1 ms.
  • D1 per-row write throughput ceiling: 1,000 writes/sec (at 1 ms/write).
  • D1 single-row container ceiling at the per-row throughput rate: 5,000 containers per location (pre-batching).
  • Queues batch consumer config: max_batch_size: 100, max_batch_timeout: 1 (second), max_retries: 1.
  • D1 batch-write P95: 0.1 ms (post-batching).
  • Container ceiling per location post-batching: 500,000 (100× headroom).
  • Steady-state queue lag: < 2 seconds.
  • Per-location queue naming: e.g. production-core-containers-queue-weur (Western Europe); one queue per region.
  • KV minimum cache TTL (recently reduced): 60s → 30s (referenced from a 2026-01-30 KV changelog) — "still too high" for this workload.
  • Migration ramp order: Quick Actions → free-tier Workers binding → PAYG → all contract customers.
  • Quick action protocol shape change: WebSocket multi-message exchange (open-page → navigate → wait-for-load → screenshot, each requires round-trip) → single HTTP request with all parameters, executed internally on the container.

Caveats

  • Vendor-authored launch post. The improvement numbers (4× concurrency, >50% Quick Action latency reduction, 100× D1 batching headroom, 0.1 ms P95 batch write) are Cloudflare-disclosed without external benchmarks. The graph showing "sharp decrease in average quick-action response time" has no axis numbers.
  • Pool sizing math undisclosed. The post says "pre-warmed DO-backed browser containers" in regional pools, but no absolute pool size, no per-region depth, no scale-up / scale-down thresholds, no idle-cost number. The "observability into the global state of each browser" claim is qualitative.
  • D1 shard topology only partially disclosed. Shards are "per location" and the queue naming convention (-weur, etc.) implies one D1 shard per region too, but the exact list of regions, the per-region capacity, and the shard-failover strategy are not disclosed. The 500,000 containers ceiling is "per location", not global.
  • Backup-region fallback semantics undisclosed. "Each region falls back to a designated backup region until the primary queue catches up" — but no detail on (a) how the primary queue's catch-up is detected, (b) whether fallbacks compose (region A → region B; if region B also backed up, where?), (c) whether the data plane (running browsers in the primary) is re-routed during the fallback or stays put, (d) whether two regions racing on the same pool can over-allocate.
  • Race-condition impact on customers undisclosed. The post says KV's eventual consistency caused "overallocation of browsers, severely limiting how fast we could scale to meet demand spikes" — but the customer-visible failure mode (timeouts? errors? fallback to a different browser?) and the rate of occurrence pre-fix are not disclosed.
  • Single-HTTP-request protocol shape undocumented externally. The new internal-execution flow is described qualitatively but the wire format (URL shape, params, response envelope, error semantics) is not in the post; not yet in the Quick Actions docs.
  • WebGL / WebMCP availability is feature-flag implicit. "now available for browser-based rendering" doesn't specify whether opt-in flag, default-on, gated by plan, or behind a Workers binding version.
  • DO+Container pair lifecycle undisclosed. Is the pair created together, hibernated together, evicted together? How is a "DO-container pair closest to the user within that region" identified at request time? The post implies KV is no longer the source of truth for that decision but the new selection mechanism (D1 candidate-pool query? in-process state? consistent hash?) is not spelled out.
  • The LIMIT ?5 in the SQL is unexplained. The query reproduced selects 5 candidates and atomically claims one via the UPDATE...WHERE...IN (SELECT...) shape. Why 5? Whether the 5 are then chosen-among by a downstream caller, or whether all 5 are claimed and 4 are immediately released, is not disclosed.
  • Customer Zero framing is self-disclosed. "Light on documentation, light on observability, and light on colleagues in an overlapping timezone" is the candid half of the framing; the "feedback loop leading to substantial upgrades" claim is qualitative — no specific Containers-platform improvement is named with attribution back to Browser Run's feedback.

Source

Last updated · 542 distilled / 1,571 read