CONCEPT Cited by 1 source
Queue length vs wait time¶
Definition¶
For any queue, there are two natural observables:
- Queue length = how many items are currently waiting.
- Wait time = how long an item spent in the queue before being served (or how long the head-of-queue has been waiting).
These are related by Little's Law
(L = λW, where L is queue length, λ is arrival rate, and W
is wait time) but not interchangeable for operator intuition or
for throttler signal design.
The airport-queue analogy¶
"A long queue at the airport isn't in itself a bad thing — some queues move quite fast, and yet it's often a predictor to wait times. Where wait time is impossible or difficult to measure, queue length can be an alternative."
A 50-person TSA line that drains in 3 minutes is fine; a 10-person line stuck for 20 minutes is not. The customer's latency experience is dominated by wait time, not by queue length.
Wait time is the better signal¶
Wait time is what the user / client / downstream consumer actually cares about. It is the service-level metric. Throttlers, SLO monitors, and capacity planners ideally base decisions on wait time:
- Replication lag = wait time in the changelog queue.
- Commit delay = wait time in the commit queue.
- Run-queue latency = wait time for a CPU.
Queue length is the cheaper fallback¶
Wait time requires instrumenting every item's enqueue and dequeue moment — it costs tracking state per item. Queue length requires only a single gauge reading. When wait-time instrumentation is absent or expensive, queue length is the operator's fallback:
threads_running(MySQL) = length of the running-query queue.- Load average (Linux) = length of the runnable+D-state task queue (see concepts/load-average).
- Pending connections = length of the new-connection queue.
The cost of substituting length for wait time is that the operator has to carry context in their head about how fast the queue typically drains to interpret the number. A length of 50 is either fine or catastrophic depending on service rate.
Design implication for throttlers¶
A throttler that uses a queue- length metric inherits the queue-drain-rate context as an implicit part of its threshold:
- Static-threshold-on-length (e.g.
reject if length > N) only works if the drain rate is stable. It breaks when drain rate varies with time-of-day, query mix, or co-tenant workload. - Static-threshold-on-wait-time (e.g.
reject if p99_wait > T) is invariant under drain-rate changes — the client's latency experience is the same regardless of how long the queue is.
This is why Noach's metric hierarchy puts commit delay (wait
time on the commit queue) above threads_running (length of
the running queue): the former has a stable threshold, the latter
does not.
Seen in¶
- sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1
— canonical wiki articulation of the trade-off. Introduced in
the course of explaining why
threads_running+ load average are less reliable throttling signals than replication lag + commit delay: the former two are queue lengths with unstable drain-rate context, the latter two are wait times.
Related¶
- concepts/queueing-theory — parent framing; Little's Law connects the two.
- concepts/database-throttler — the use case motivating the distinction.
- concepts/symptom-vs-cause-metric — both length and wait time are symptoms of the queue backup; neither alone identifies the cause.
- concepts/load-average — canonical "queue length as fallback when wait time is hard to measure" metric on Linux.
- concepts/run-queue-latency — the wait-time counterpart that eBPF now makes tractable.