PATTERN Cited by 1 source
Queue batching amortizes DB write throughput¶
Problem¶
A high-frequency periodic write (heartbeat, state update, telemetry sample) lands directly on a transactional DB at a per-row cost (~1 ms/row). Under fleet scale (thousands of writers, each writing every few seconds), the per-row write rate caps the writer-count ceiling at the DB's per-shard throughput limit:
Naively scaled, the writer count maxes out at a few thousand. Demand grows past that and the DB enters overload.
Pattern¶
Insert a batching queue between writers and the DB. The writer enqueues each update (cheap, async). A consumer Worker drains the queue in batches and applies them to the DB as one bulk write per batch. Modern transactional DBs commit a batch at near the same wall-clock cost as a single row, giving a per-row throughput multiplier ≈ batch-size for the same DB-shard load.
Two batching knobs: max batch size + max batch timeout (whichever fires first). At max batch size: throughput-bound draining. At max batch timeout: latency-bound draining for low-volume periods.
Verbatim canonical articulation¶
From the 2026-05-13 Browser Run migration (Source: sources/2026-05-13-cloudflare-browser-run-now-running-on-cloudflare-containers-its-faster):
"We keep D1 shards per location and given that we may have several thousand containers running, and that each container needs to update its state every 5 seconds, we kept running into a problem: we would overload the database. For instance, if each write takes 1ms we can only write at most 1,000 times, which at one row per write would mean that we could only have 5,000 containers before overloading the database. However, if we batch those writes, we can get much higher values, because batch writes are not significantly longer than individual ones, so we can increase the throughput in orders of magnitude. In our case, we use 100 row batches, which means we can now update a maximum of 500,000 containers per location. This headroom means capacity planning is no longer a bottleneck."
The canonical config (preserved verbatim):
{
"queues": {
"consumers": [
{
"queue": "production-core-containers-queue-weur",
"max_batch_size": 100,
"max_batch_timeout": 1,
"max_retries": 1
}
]
}
}
The arithmetic, written out: 5,000 → 500,000 containers per location = 100× headroom. The post discloses D1 batch-write P95 = 0.1 ms, which means a 100-row batch is ~10× faster than a single-row write — i.e. per-row efficiency improves by ~1,000× at the same wall-clock budget per commit. Steady-state queue lag: <2 seconds.
Preconditions¶
- The writes are tolerant of bounded staleness. A 1–2 second delay between writer-emit and DB-visible is acceptable. (Fails for synchronous claim writes — see patterns/transactional-db-over-eventually-consistent-kv-for-claim which is the non-batched companion pattern for claim semantics.)
- The DB has efficient bulk-write semantics. A 100-row batch must commit at near the same wall-clock cost as a single row. SQLite (D1) and PostgreSQL bulk inserts/updates meet this; some KV stores don't.
- The queue substrate has tunable batch knobs. Batch size + batch timeout are the load-bearing primitives.
- Producer-side cost of enqueue is much lower than direct DB write. Cloudflare Queues enqueue cost ≪ 1 ms; the producer Worker doesn't pay the per-row DB write cost.
When the pattern fits¶
- High-frequency periodic state telemetry (Browser Run's per-container 5-second state heartbeat is canonical).
- Bulk-mutable workloads where the writer count grows faster than per-shard write throughput.
- Background-write sidetrack alongside a hot-path claim: the claim stays unbatched (sub-ms semantics required); the state-update path batches.
When the pattern doesn't fit¶
- Writes that need synchronous commit visibility — allocation claims, payment confirmations, anything where the producer needs the result before continuing. patterns/transactional-db-over-eventually-consistent-kv-for-claim is the right pattern there.
- Latency-budget below the batch-timeout floor — at
max_batch_timeout: 1sthe post-write visibility gap is ~1 second worst-case at low volume. Workloads requiring sub-100 ms visibility need a different approach. - Per-row business logic that can't be expressed as a bulk statement — if each row needs a side effect outside the DB, batching is harder.
- Order-sensitive writes — by default queue consumers may not preserve strict per-key ordering across batches.
Failure modes¶
- Queue backlog under burst — when producer rate exceeds consumer drain rate, the queue grows and lag exceeds staleness budget. Mitigated by patterns/region-fallback-on-queue-backlog: a backup region serves the read until primary catches up.
- Consumer Worker errors —
max_retries: 1(Browser Run's config) means one retry then dead-letter. A failing batch doesn't perpetually replay; staleness is acceptable for retried-once and given up. - Batch boundary inconsistency — if a batch contains conflicting updates for the same key, the SQL semantics determine which wins (typically last-writer-wins by physical order in the batch). Application-level deduping at the consumer side may be needed.
- DB shard hot-spot — if all writers in a region target the
same D1 shard, batching helps, but cross-region rebalancing
is a separate problem. Browser Run's per-location queue
naming (
-weur,-eeur, …) implies one queue per region paired with one D1 shard per region. - Batch-size tuning sensitivity — too small (100 → 10), the throughput multiplier shrinks proportionally. Too large (100 → 10,000), per-batch latency grows and individual-batch failures cost more.
Composes with¶
- patterns/transactional-db-over-eventually-consistent-kv-for-claim — the claim path stays unbatched (sub-ms latency required); this pattern handles the background telemetry path that shares the same DB shard.
- patterns/region-fallback-on-queue-backlog — when this pattern's queue is delayed past the staleness budget, the fallback pattern routes reads to a backup region.
- concepts/network-round-trip-cost — the parent arithmetic. Batching is the canonical way to amortise per-RPC overhead at any RPC altitude. This pattern is the Cloudflare-Queues-over-D1 instance.
Sibling-pattern contrast¶
| Pattern | Substrate | Batch unit | Trigger |
|---|---|---|---|
| Queue batching to DB (this pattern) | Queue + DB consumer | Q messages | size/timeout |
| patterns/batch-over-network-to-broker | Producer → Kafka | Records | size/timeout |
| patterns/bulk-write-batch-optimization | Application → DB direct | Records | application-controlled |
Seen in¶
- sources/2026-05-13-cloudflare-browser-run-now-running-on-cloudflare-containers-its-faster
— canonical wiki instance. Browser Run uses Cloudflare
Queues at
max_batch_size: 100, max_batch_timeout: 1to amortise per-container 5-second state-update writes against D1. The 100×-headroom math (5,000 → 500,000 containers per location) is the disclosed payoff; D1 batch-write P95 of 0.1 ms is the underlying enabler.
Related¶
- systems/cloudflare-queues — substrate.
- systems/cloudflare-d1 — downstream consumer.
- systems/cloudflare-browser-rendering — canonical consumer.
- concepts/network-round-trip-cost — parent latency arithmetic.
- concepts/batching-latency-tradeoff — the latency-vs- throughput trade.
- patterns/batch-over-network-to-broker — Kafka producer altitude sibling.
- patterns/bulk-write-batch-optimization — application-side altitude sibling.
- companies/cloudflare — operator.