Skip to content

PATTERN Cited by 1 source

Queue batching amortizes DB write throughput

Problem

A high-frequency periodic write (heartbeat, state update, telemetry sample) lands directly on a transactional DB at a per-row cost (~1 ms/row). Under fleet scale (thousands of writers, each writing every few seconds), the per-row write rate caps the writer-count ceiling at the DB's per-shard throughput limit:

writers × writes/sec/writer  ≤  1000 / latency_ms  ≈  1,000 writes/sec

Naively scaled, the writer count maxes out at a few thousand. Demand grows past that and the DB enters overload.

Pattern

Insert a batching queue between writers and the DB. The writer enqueues each update (cheap, async). A consumer Worker drains the queue in batches and applies them to the DB as one bulk write per batch. Modern transactional DBs commit a batch at near the same wall-clock cost as a single row, giving a per-row throughput multiplier ≈ batch-size for the same DB-shard load.

writers (N)   →  Queue  →  consumer drains 100/batch  →  D1 batch UPDATE
   (each
    writes
    every 5s)

Two batching knobs: max batch size + max batch timeout (whichever fires first). At max batch size: throughput-bound draining. At max batch timeout: latency-bound draining for low-volume periods.

Verbatim canonical articulation

From the 2026-05-13 Browser Run migration (Source: sources/2026-05-13-cloudflare-browser-run-now-running-on-cloudflare-containers-its-faster):

"We keep D1 shards per location and given that we may have several thousand containers running, and that each container needs to update its state every 5 seconds, we kept running into a problem: we would overload the database. For instance, if each write takes 1ms we can only write at most 1,000 times, which at one row per write would mean that we could only have 5,000 containers before overloading the database. However, if we batch those writes, we can get much higher values, because batch writes are not significantly longer than individual ones, so we can increase the throughput in orders of magnitude. In our case, we use 100 row batches, which means we can now update a maximum of 500,000 containers per location. This headroom means capacity planning is no longer a bottleneck."

The canonical config (preserved verbatim):

{
    "queues": {
        "consumers": [
            {
                "queue": "production-core-containers-queue-weur",
                "max_batch_size": 100,
                "max_batch_timeout": 1,
                "max_retries": 1
            }
        ]
    }
}

The arithmetic, written out: 5,000 → 500,000 containers per location = 100× headroom. The post discloses D1 batch-write P95 = 0.1 ms, which means a 100-row batch is ~10× faster than a single-row write — i.e. per-row efficiency improves by ~1,000× at the same wall-clock budget per commit. Steady-state queue lag: <2 seconds.

Preconditions

  1. The writes are tolerant of bounded staleness. A 1–2 second delay between writer-emit and DB-visible is acceptable. (Fails for synchronous claim writes — see patterns/transactional-db-over-eventually-consistent-kv-for-claim which is the non-batched companion pattern for claim semantics.)
  2. The DB has efficient bulk-write semantics. A 100-row batch must commit at near the same wall-clock cost as a single row. SQLite (D1) and PostgreSQL bulk inserts/updates meet this; some KV stores don't.
  3. The queue substrate has tunable batch knobs. Batch size + batch timeout are the load-bearing primitives.
  4. Producer-side cost of enqueue is much lower than direct DB write. Cloudflare Queues enqueue cost ≪ 1 ms; the producer Worker doesn't pay the per-row DB write cost.

When the pattern fits

  • High-frequency periodic state telemetry (Browser Run's per-container 5-second state heartbeat is canonical).
  • Bulk-mutable workloads where the writer count grows faster than per-shard write throughput.
  • Background-write sidetrack alongside a hot-path claim: the claim stays unbatched (sub-ms semantics required); the state-update path batches.

When the pattern doesn't fit

  • Writes that need synchronous commit visibility — allocation claims, payment confirmations, anything where the producer needs the result before continuing. patterns/transactional-db-over-eventually-consistent-kv-for-claim is the right pattern there.
  • Latency-budget below the batch-timeout floor — at max_batch_timeout: 1s the post-write visibility gap is ~1 second worst-case at low volume. Workloads requiring sub-100 ms visibility need a different approach.
  • Per-row business logic that can't be expressed as a bulk statement — if each row needs a side effect outside the DB, batching is harder.
  • Order-sensitive writes — by default queue consumers may not preserve strict per-key ordering across batches.

Failure modes

  • Queue backlog under burst — when producer rate exceeds consumer drain rate, the queue grows and lag exceeds staleness budget. Mitigated by patterns/region-fallback-on-queue-backlog: a backup region serves the read until primary catches up.
  • Consumer Worker errorsmax_retries: 1 (Browser Run's config) means one retry then dead-letter. A failing batch doesn't perpetually replay; staleness is acceptable for retried-once and given up.
  • Batch boundary inconsistency — if a batch contains conflicting updates for the same key, the SQL semantics determine which wins (typically last-writer-wins by physical order in the batch). Application-level deduping at the consumer side may be needed.
  • DB shard hot-spot — if all writers in a region target the same D1 shard, batching helps, but cross-region rebalancing is a separate problem. Browser Run's per-location queue naming (-weur, -eeur, …) implies one queue per region paired with one D1 shard per region.
  • Batch-size tuning sensitivity — too small (100 → 10), the throughput multiplier shrinks proportionally. Too large (100 → 10,000), per-batch latency grows and individual-batch failures cost more.

Composes with

Sibling-pattern contrast

Pattern Substrate Batch unit Trigger
Queue batching to DB (this pattern) Queue + DB consumer Q messages size/timeout
patterns/batch-over-network-to-broker Producer → Kafka Records size/timeout
patterns/bulk-write-batch-optimization Application → DB direct Records application-controlled

Seen in

Last updated · 542 distilled / 1,571 read