CONCEPT Cited by 1 source
Transaction commit delay¶
Definition¶
Transaction commit delay — also called commit queue delay or commit latency — is the time a transaction spends in the commit/flush queue waiting for its changes to be durably written to the transaction log (redo log in InnoDB, WAL in Postgres) before returning success to the client.
It is the queue wait time for the commit queue, analogous to how replication lag is the queue wait time for the changelog queue. See concepts/queueing-theory for the broader framing.
Why it's a useful throttling signal¶
Unlike threads_running (which
conflates commit-queue backup with lock-wait backup with
page-cache-miss backup), commit delay isolates the disk-flush
queue:
- High commit delay ⇒ disk / fsync / replication-ack is bottlenecked.
- Low commit delay ⇒ even if
threads_runningis high, the cluster is keeping up on the write path.
Noach frames this as an improvement on threads_running:
"A major contributor to a spike in concurrent queries is their inability to complete. Normally, this means they're held back at commit time, i.e. they wait to be written to the transaction (redo) log. And that means they're in the transaction queue, and we can hence measure the transaction queue latency (aka queue delay)."
Threshold setting is hardware-derived¶
Unlike threads_running (no stable threshold) and unlike replication
lag (business-derived threshold), commit delay has a
hardware-derived threshold:
| Substrate | Typical commit latency |
|---|---|
Local NVMe with innodb_flush_log_at_trx_commit=1 |
Sub-millisecond to low ms |
| AWS EBS via virtio | Few ms |
| Aurora with 6-node quorum commit | Reported to differ significantly from plain RDS |
"Transaction commit delay is typically caused by disk write/flush time, and that changes dramatically across hardware. It is a matter of knowing your metrics."
This hardware dependency is good for a throttler: once the
operator has calibrated their system's baseline, the threshold
is stable against query-mix fluctuations that would throw off
threads_running.
Group commit / batched fsync¶
Modern databases batch multiple concurrent commits into a single
fsync to amortise disk-flush cost (InnoDB group commit, Postgres
group commit). This decouples commit delay from transaction
arrival rate at low load: a commit can wait up to
innodb_flush_log_at_timeout (or similar) to be bundled, adding a
fixed jitter floor. Throttlers that read commit delay must
account for this.
Why it's still a symptom, not a cause¶
Even commit-delay isolation has limits: a spike can be caused by
disk degradation or by a co-tenant workload saturating shared
storage or by replication-ack slowdown in synchronous setups.
The metric remains a symptom
of one specific queue rather than a cause — but a narrower one
than threads_running.
Seen in¶
- sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1
— canonical wiki introduction. Noach frames it as the improvement
over
threads_runningfor throttling signals because it targets a specific queue with a hardware-stable threshold.
Related¶
- concepts/symptom-vs-cause-metric — why commit delay is still a symptom, just a narrower one.
- concepts/queue-length-vs-wait-time — commit delay is wait
time;
threads_runningis queue length. - concepts/threads-running-mysql — the broader metric that commit delay decomposes a slice of.
- concepts/database-throttler — the use case.
- concepts/queueing-theory — the framing; commit delay is the residence time in the commit queue.