Skip to content

PATTERN Cited by 1 source

Transactional DB over eventually-consistent KV for claim

Problem

An exclusive-resource allocation hot path is backed by an eventually-consistent KV store. Under demand spikes, the store's convergence time exceeds the rate at which resources are claimed, and stale reads return "available" values that have already been claimed by a concurrent requester. The result is race conditions and overallocation (concepts/eventual-consistency-too-slow-for-allocation).

Tightening the KV's cache TTL helps incrementally but doesn't close the gap — the convergence floor is set by the store's design (edge-cache replication strategy), not by configuration that can be tuned arbitrarily down.

Pattern

Migrate the allocation hot path from the eventually-consistent KV to a transactional database (typically SQLite or PostgreSQL) with a single atomic statement that:

  1. Selects a candidate pool (CTE / subquery).
  2. Claims one resource from the pool atomically (UPDATE ... WHERE ... IN (SELECT ... LIMIT N) RETURNING).
  3. Returns the resource's identifier and metadata in the same round-trip.

Two stores, two responsibility tiers, after migration:

[allocation-claim path]            [global view path]
       (hot)                            (warm)
        |                                  |
        v                                  v
  [transactional DB]                [eventually-consistent KV]
  (D1 / Postgres / SQLite)          (Workers KV / Dynamo / etc.)
        ^                                  ^
        |                                  |
   atomic claim                       observability,
   exclusive resource                 capacity dashboards,
                                      read-after-update lag tolerable

The KV may still play a role — capacity dashboards, observability, cross-region overview — but the claim itself moves to the transactional store.

Cloudflare canonical instance

From the 2026-05-13 Browser Run migration post (Source: sources/2026-05-13-cloudflare-browser-run-now-running-on-cloudflare-containers-its-faster):

"We previously stored each container state in KV. This meant that we could keep getting a minute old state due to cache TTL (recently KV changed the minimum cache TTL to 30 seconds, but even so that value is still too high). We decided to migrate the container state into D1 instances instead. D1's transactional nature is a good fit here. Once we assign a browser to a user, it's exclusively theirs. Browsers are not shared resources. SQLite transactions ensure atomic assignment and prevent race conditions where two requests might claim the same browser simultaneously."

The canonical SQL shape after migration (concepts/sqlite-transaction-for-atomic-resource-claim):

WITH candidate_pool AS (
    -- candidate pool logic to pick based on latency and other rules
)
UPDATE containers
SET status = 'picked'
WHERE sessionId IN (
    SELECT sessionId
    FROM candidate_pool
    ORDER BY RANDOM()
    LIMIT ?5
)
RETURNING data

Preconditions

  1. Resource is exclusive — the allocation contract is "exactly one caller holds the resource at a time."
  2. Claim happens at hot-path frequency — fast enough that the eventually-consistent store's convergence floor is exceeded.
  3. A transactional store with edge-deployment ergonomics is available — D1 (per-region SQLite as a Workers binding) in Cloudflare's case; per-region Postgres / serverless Aurora elsewhere.
  4. The allocation rate fits within the transactional store's per-shard write throughput — at ~1 ms / write SQLite, ~1,000 claims/sec/shard. For higher rates, shard the resource pool.

When the pattern fits

  • Edge-scale resource allocation — headless-browser instances, GPU sessions, pre-provisioned VM slots, agent-runtime sandboxes.
  • Migration off an eventually-consistent store that's almost working — the workload is currently KV/Dynamo/Cassandra-backed and showing race-condition symptoms.
  • Workloads where the failure mode is over-allocation, not under-allocation — duplicate claims must be eliminated; occasional missed-claim under retry is acceptable.

When the pattern doesn't fit

  • Read-mostly workloads — eventually-consistent reads are usually fine. Don't migrate everything; just the claim.
  • Per-shard claim rate exceeds the transactional store's ceiling — sharding helps but introduces cross-shard consistency questions. At very high rates the pattern needs a queue-coordinated allocator instead.
  • Resource is not exclusive — if multiple callers can share the resource, eventually-consistent state is fine.

Failure modes

  • Per-shard write throughput cap. SQLite's serial-writer semantics mean per-database claim throughput is bounded by per-write latency. Shard the pool to scale.
  • Cross-shard claim is not atomic. A claim against shard A cannot reserve a resource in shard B. The pool topology must align with the claim topology.
  • Migration coexistence is fiddly. During the migration window both stores are partially trusted; deciding which is authoritative needs explicit cutover planning.
  • Background-update writes still need scaling. The non-claim writes (heartbeats, state updates) can swamp the transactional store's per-row write rate. patterns/queue-batching-amortizes-db-write-throughput is the canonical companion pattern (Browser Run uses Cloudflare Queues with max_batch_size: 100 for this).

Composes with

Sibling-pattern contrast

Pattern Allocation discipline Throughput cost Race window
Eventually-consistent KV (this pattern's "before" state) Optimistic, retry on collision Low Wide (concepts/check-then-act-race amplified by KV TTL)
Transactional DB claim (this pattern) Atomic, single statement Medium (per-shard ceiling) None
Lease-based with quorum Lease on claim, expire on timeout Medium-high None (lease semantics)
Allocator-as-coordinator service Single allocator process Allocator throughput None (single writer)

Seen in

Last updated · 542 distilled / 1,571 read