Skip to content

CONCEPT Cited by 1 source

Queue length vs wait time

Definition

For any queue, there are two natural observables:

  • Queue length = how many items are currently waiting.
  • Wait time = how long an item spent in the queue before being served (or how long the head-of-queue has been waiting).

These are related by Little's Law (L = λW, where L is queue length, λ is arrival rate, and W is wait time) but not interchangeable for operator intuition or for throttler signal design.

The airport-queue analogy

"A long queue at the airport isn't in itself a bad thing — some queues move quite fast, and yet it's often a predictor to wait times. Where wait time is impossible or difficult to measure, queue length can be an alternative."

Shlomi Noach

A 50-person TSA line that drains in 3 minutes is fine; a 10-person line stuck for 20 minutes is not. The customer's latency experience is dominated by wait time, not by queue length.

Wait time is the better signal

Wait time is what the user / client / downstream consumer actually cares about. It is the service-level metric. Throttlers, SLO monitors, and capacity planners ideally base decisions on wait time:

  • Replication lag = wait time in the changelog queue.
  • Commit delay = wait time in the commit queue.
  • Run-queue latency = wait time for a CPU.

Queue length is the cheaper fallback

Wait time requires instrumenting every item's enqueue and dequeue moment — it costs tracking state per item. Queue length requires only a single gauge reading. When wait-time instrumentation is absent or expensive, queue length is the operator's fallback:

  • threads_running (MySQL) = length of the running-query queue.
  • Load average (Linux) = length of the runnable+D-state task queue (see concepts/load-average).
  • Pending connections = length of the new-connection queue.

The cost of substituting length for wait time is that the operator has to carry context in their head about how fast the queue typically drains to interpret the number. A length of 50 is either fine or catastrophic depending on service rate.

Design implication for throttlers

A throttler that uses a queue- length metric inherits the queue-drain-rate context as an implicit part of its threshold:

  • Static-threshold-on-length (e.g. reject if length > N) only works if the drain rate is stable. It breaks when drain rate varies with time-of-day, query mix, or co-tenant workload.
  • Static-threshold-on-wait-time (e.g. reject if p99_wait > T) is invariant under drain-rate changes — the client's latency experience is the same regardless of how long the queue is.

This is why Noach's metric hierarchy puts commit delay (wait time on the commit queue) above threads_running (length of the running queue): the former has a stable threshold, the latter does not.

Seen in

  • sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1 — canonical wiki articulation of the trade-off. Introduced in the course of explaining why threads_running + load average are less reliable throttling signals than replication lag + commit delay: the former two are queue lengths with unstable drain-rate context, the latter two are wait times.
Last updated · 319 distilled / 1,201 read