PATTERN Cited by 1 source
Transactional DB over eventually-consistent KV for claim¶
Problem¶
An exclusive-resource allocation hot path is backed by an eventually-consistent KV store. Under demand spikes, the store's convergence time exceeds the rate at which resources are claimed, and stale reads return "available" values that have already been claimed by a concurrent requester. The result is race conditions and overallocation (concepts/eventual-consistency-too-slow-for-allocation).
Tightening the KV's cache TTL helps incrementally but doesn't close the gap — the convergence floor is set by the store's design (edge-cache replication strategy), not by configuration that can be tuned arbitrarily down.
Pattern¶
Migrate the allocation hot path from the eventually-consistent KV to a transactional database (typically SQLite or PostgreSQL) with a single atomic statement that:
- Selects a candidate pool (CTE / subquery).
- Claims one resource from the pool atomically
(
UPDATE ... WHERE ... IN (SELECT ... LIMIT N) RETURNING). - Returns the resource's identifier and metadata in the same round-trip.
Two stores, two responsibility tiers, after migration:
[allocation-claim path] [global view path]
(hot) (warm)
| |
v v
[transactional DB] [eventually-consistent KV]
(D1 / Postgres / SQLite) (Workers KV / Dynamo / etc.)
^ ^
| |
atomic claim observability,
exclusive resource capacity dashboards,
read-after-update lag tolerable
The KV may still play a role — capacity dashboards, observability, cross-region overview — but the claim itself moves to the transactional store.
Cloudflare canonical instance¶
From the 2026-05-13 Browser Run migration post (Source: sources/2026-05-13-cloudflare-browser-run-now-running-on-cloudflare-containers-its-faster):
"We previously stored each container state in KV. This meant that we could keep getting a minute old state due to cache TTL (recently KV changed the minimum cache TTL to 30 seconds, but even so that value is still too high). We decided to migrate the container state into D1 instances instead. D1's transactional nature is a good fit here. Once we assign a browser to a user, it's exclusively theirs. Browsers are not shared resources. SQLite transactions ensure atomic assignment and prevent race conditions where two requests might claim the same browser simultaneously."
The canonical SQL shape after migration (concepts/sqlite-transaction-for-atomic-resource-claim):
WITH candidate_pool AS (
-- candidate pool logic to pick based on latency and other rules
)
UPDATE containers
SET status = 'picked'
WHERE sessionId IN (
SELECT sessionId
FROM candidate_pool
ORDER BY RANDOM()
LIMIT ?5
)
RETURNING data
Preconditions¶
- Resource is exclusive — the allocation contract is "exactly one caller holds the resource at a time."
- Claim happens at hot-path frequency — fast enough that the eventually-consistent store's convergence floor is exceeded.
- A transactional store with edge-deployment ergonomics is available — D1 (per-region SQLite as a Workers binding) in Cloudflare's case; per-region Postgres / serverless Aurora elsewhere.
- The allocation rate fits within the transactional store's per-shard write throughput — at ~1 ms / write SQLite, ~1,000 claims/sec/shard. For higher rates, shard the resource pool.
When the pattern fits¶
- Edge-scale resource allocation — headless-browser instances, GPU sessions, pre-provisioned VM slots, agent-runtime sandboxes.
- Migration off an eventually-consistent store that's almost working — the workload is currently KV/Dynamo/Cassandra-backed and showing race-condition symptoms.
- Workloads where the failure mode is over-allocation, not under-allocation — duplicate claims must be eliminated; occasional missed-claim under retry is acceptable.
When the pattern doesn't fit¶
- Read-mostly workloads — eventually-consistent reads are usually fine. Don't migrate everything; just the claim.
- Per-shard claim rate exceeds the transactional store's ceiling — sharding helps but introduces cross-shard consistency questions. At very high rates the pattern needs a queue-coordinated allocator instead.
- Resource is not exclusive — if multiple callers can share the resource, eventually-consistent state is fine.
Failure modes¶
- Per-shard write throughput cap. SQLite's serial-writer semantics mean per-database claim throughput is bounded by per-write latency. Shard the pool to scale.
- Cross-shard claim is not atomic. A claim against shard A cannot reserve a resource in shard B. The pool topology must align with the claim topology.
- Migration coexistence is fiddly. During the migration window both stores are partially trusted; deciding which is authoritative needs explicit cutover planning.
- Background-update writes still need scaling. The
non-claim writes (heartbeats, state updates) can swamp
the transactional store's per-row write rate.
patterns/queue-batching-amortizes-db-write-throughput
is the canonical companion pattern (Browser Run uses
Cloudflare Queues with
max_batch_size: 100for this).
Composes with¶
- concepts/sqlite-transaction-for-atomic-resource-claim — the SQL primitive at the heart of the post-migration claim.
- patterns/queue-batching-amortizes-db-write-throughput — the throughput-amortisation pattern for the non-claim writes (heartbeats, telemetry) that must coexist with the hot-path claim writes on the same DB shard.
Sibling-pattern contrast¶
| Pattern | Allocation discipline | Throughput cost | Race window |
|---|---|---|---|
| Eventually-consistent KV (this pattern's "before" state) | Optimistic, retry on collision | Low | Wide (concepts/check-then-act-race amplified by KV TTL) |
| Transactional DB claim (this pattern) | Atomic, single statement | Medium (per-shard ceiling) | None |
| Lease-based with quorum | Lease on claim, expire on timeout | Medium-high | None (lease semantics) |
| Allocator-as-coordinator service | Single allocator process | Allocator throughput | None (single writer) |
Seen in¶
- sources/2026-05-13-cloudflare-browser-run-now-running-on-cloudflare-containers-its-faster — canonical wiki instance. Browser Run migrated container- allocation state from Workers KV to D1 specifically to eliminate race-condition overallocation under demand spikes from AI agent builders.
Related¶
- systems/cloudflare-kv — pre-migration substrate.
- systems/cloudflare-d1 — post-migration substrate.
- concepts/eventual-consistency-too-slow-for-allocation — the motivating failure mode.
- concepts/sqlite-transaction-for-atomic-resource-claim — the load-bearing SQL primitive.
- concepts/check-then-act-race — the underlying concurrency hazard the pattern eliminates.
- concepts/eventual-consistency — the parent property.
- patterns/queue-batching-amortizes-db-write-throughput — companion pattern for the non-claim writes.
- companies/cloudflare — operator.