CONCEPT Cited by 2 sources

Queue length vs wait time¶

Definition¶

For any queue, there are two natural observables:

Queue length = how many items are currently waiting.
Wait time = how long an item spent in the queue before being served (or how long the head-of-queue has been waiting).

These are related by Little's Law (L = λW, where L is queue length, λ is arrival rate, and W is wait time) but not interchangeable for operator intuition or for throttler signal design.

The airport-queue analogy¶

"A long queue at the airport isn't in itself a bad thing — some queues move quite fast, and yet it's often a predictor to wait times. Where wait time is impossible or difficult to measure, queue length can be an alternative."

— Shlomi Noach

A 50-person TSA line that drains in 3 minutes is fine; a 10-person line stuck for 20 minutes is not. The customer's latency experience is dominated by wait time, not by queue length.

Wait time is the better signal¶

Wait time is what the user / client / downstream consumer actually cares about. It is the service-level metric. Throttlers, SLO monitors, and capacity planners ideally base decisions on wait time:

Replication lag = wait time in the changelog queue.
Commit delay = wait time in the commit queue.
Run-queue latency = wait time for a CPU.

Queue length is the cheaper fallback¶

Wait time requires instrumenting every item's enqueue and dequeue moment — it costs tracking state per item. Queue length requires only a single gauge reading. When wait-time instrumentation is absent or expensive, queue length is the operator's fallback:

threads_running (MySQL) = length of the running-query queue.
Load average (Linux) = length of the runnable+D-state task queue (see concepts/load-average).
Pending connections = length of the new-connection queue.

The cost of substituting length for wait time is that the operator has to carry context in their head about how fast the queue typically drains to interpret the number. A length of 50 is either fine or catastrophic depending on service rate.

Design implication for throttlers¶

A throttler that uses a queue- length metric inherits the queue-drain-rate context as an implicit part of its threshold:

Static-threshold-on-length (e.g. reject if length > N) only works if the drain rate is stable. It breaks when drain rate varies with time-of-day, query mix, or co-tenant workload.
Static-threshold-on-wait-time (e.g. reject if p99_wait > T) is invariant under drain-rate changes — the client's latency experience is the same regardless of how long the queue is.

This is why Noach's metric hierarchy puts commit delay (wait time on the commit queue) above threads_running (length of the running queue): the former has a stable threshold, the latter does not.

Seen in¶

sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1 — canonical wiki articulation of the trade-off. Introduced in the course of explaining why threads_running + load average are less reliable throttling signals than replication lag + commit delay: the former two are queue lengths with unstable drain-rate context, the latter two are wait times.
sources/2026-05-05-redpanda-littles-law-in-practice-with-cloud-topics — sibling application of the same axis on the latency-hiding-queue side. Cloud Topics' upload queue is sized for T × W so steady-state queue length is expected, not a saturation alarm — wait-time-vs-length is the only signal that distinguishes "running as designed" from "saturation, applying backpressure." The Redpanda post applies Little's Law in its engineering form (Throughput = Latency × Concurrency); this concept's airport-queue framing is the operator-side interpretation of the same algebra.

concepts/littles-law — the foundational law connecting length and wait time.
concepts/queueing-theory — parent framing.
concepts/database-throttler — the use case motivating the distinction.
concepts/symptom-vs-cause-metric — both length and wait time are symptoms of the queue backup; neither alone identifies the cause.
concepts/load-average — canonical "queue length as fallback when wait time is hard to measure" metric on Linux.
concepts/run-queue-latency — the wait-time counterpart that eBPF now makes tractable.
concepts/queue-depth-as-latency-hiding-mechanism — sibling axis: the queue's depth is deliberately large in steady state, which is the operational regime where the length-vs-wait-time distinction matters most.