CONCEPT Cited by 1 source

Transaction commit delay¶

Definition¶

Transaction commit delay — also called commit queue delay or commit latency — is the time a transaction spends in the commit/flush queue waiting for its changes to be durably written to the transaction log (redo log in InnoDB, WAL in Postgres) before returning success to the client.

It is the queue wait time for the commit queue, analogous to how replication lag is the queue wait time for the changelog queue. See concepts/queueing-theory for the broader framing.

Why it's a useful throttling signal¶

Unlike threads_running (which conflates commit-queue backup with lock-wait backup with page-cache-miss backup), commit delay isolates the disk-flush queue:

High commit delay ⇒ disk / fsync / replication-ack is bottlenecked.
Low commit delay ⇒ even if threads_running is high, the cluster is keeping up on the write path.

Noach frames this as an improvement on threads_running:

"A major contributor to a spike in concurrent queries is their inability to complete. Normally, this means they're held back at commit time, i.e. they wait to be written to the transaction (redo) log. And that means they're in the transaction queue, and we can hence measure the transaction queue latency (aka queue delay)."

— Anatomy of a Throttler, part 1

Threshold setting is hardware-derived¶

Unlike threads_running (no stable threshold) and unlike replication lag (business-derived threshold), commit delay has a hardware-derived threshold:

Substrate	Typical commit latency
Local NVMe with `innodb_flush_log_at_trx_commit=1`	Sub-millisecond to low ms
AWS EBS via virtio	Few ms
Aurora with 6-node quorum commit	Reported to differ significantly from plain RDS

"Transaction commit delay is typically caused by disk write/flush time, and that changes dramatically across hardware. It is a matter of knowing your metrics."

This hardware dependency is good for a throttler: once the operator has calibrated their system's baseline, the threshold is stable against query-mix fluctuations that would throw off threads_running.

Group commit / batched fsync¶

Modern databases batch multiple concurrent commits into a single fsync to amortise disk-flush cost (InnoDB group commit, Postgres group commit). This decouples commit delay from transaction arrival rate at low load: a commit can wait up to innodb_flush_log_at_timeout (or similar) to be bundled, adding a fixed jitter floor. Throttlers that read commit delay must account for this.

Why it's still a symptom, not a cause¶

Even commit-delay isolation has limits: a spike can be caused by disk degradation or by a co-tenant workload saturating shared storage or by replication-ack slowdown in synchronous setups. The metric remains a symptom of one specific queue rather than a cause — but a narrower one than threads_running.

Seen in¶

sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1 — canonical wiki introduction. Noach frames it as the improvement over threads_running for throttling signals because it targets a specific queue with a hardware-stable threshold.

concepts/symptom-vs-cause-metric — why commit delay is still a symptom, just a narrower one.
concepts/queue-length-vs-wait-time — commit delay is wait time; threads_running is queue length.
concepts/threads-running-mysql — the broader metric that commit delay decomposes a slice of.
concepts/database-throttler — the use case.
concepts/queueing-theory — the framing; commit delay is the residence time in the commit queue.