PATTERN

LIFO queuing for tail latency¶

Problem¶

A latency-sensitive serving path queues requests under transient backend slowdowns. With a FIFO queue, every new request arriving during the slowdown waits behind the head-of-queue request that triggered the stall. A brief 100 ms backend latency spike cascades into 100 ms × queue- depth total delay for everything behind it. Time-bounded clients (short timeouts) abandon their requests, but the work has already been spent serving stale entries by the time the queue drains. See concepts/tail-latency-spike-during-queueing.

Pattern¶

Make the queue LIFO (last-in-first-out) instead of FIFO:

The newest-arriving request goes to the front of the queue.
Older queued requests get served only after the fresh ones are drained.
Under a slowdown, the old tail entries time out / get evicted; the new tail gets served promptly.

The argument: when clients are time-bounded anyway, it's better to deliver new requests quickly (the caller is still listening) than to deliver old requests slowly (the caller has already timed out).

Zalando applied this at two altitudes in PRAPI:

Load balancer queue — "switching to LIFO reduced long-tail latency spikes when queuing occurred."
DynamoDB client queue — paired with a two-client fallback (10 ms primary timeout, 100 ms fallback timeout). LIFO on the primary client's queue prevents a DynamoDB latency spike from cascading through every inflight request.

(Source: .)

Implementation notes¶

Pair with timeouts. LIFO without per-request timeouts can starve old entries indefinitely. The point is that stale entries should be abandoned, so an eviction/ expiry discipline for old queued entries is required.
Not all queues tolerate this. Protocol queues (TCP, Kafka partitions) are FIFO by contract; you can't swap. LIFO applies to work queues inside a single process — load-balancer local queues, DB client pools, request schedulers.
Work-stealing may dominate. Under steady-state load with no queue, FIFO vs. LIFO is irrelevant. This pattern's benefit is strictly at the queue-formed transient.

When it's wrong¶

Ordered or transactional work. Mutations that must happen in arrival order can't be LIFO'd.
Persistent work queues. LIFO on a durable queue means some work is effectively never processed; use FIFO + backpressure.
Unbounded queues with no eviction. Old entries sit forever. Either bound the queue or add a reaper; LIFO without either is worse than FIFO.

Adjacent tactics that compose¶

Concurrent primary + fallback clients with different timeouts. PRAPI's DynamoDB setup: 10 ms primary timeout, 100 ms fallback timeout for retries. The primary's queue is LIFO-disciplined; the fallback absorbs stragglers.
Hedged requests. Issue a second request after the primary's P99 elapses. Pairs naturally with LIFO — the hedged copy jumps any FIFO queue blocking the original.
Backpressure at admission. Reject new work upstream when queues are full, rather than queuing and LIFO- sorting them. Works better when callers can retry with exponential backoff.

Seen in¶

— canonical production instance across both the LB and the DynamoDB client, with the two-timeout primary/ fallback decomposition making the whole thing work.

concepts/tail-latency-spike-during-queueing — the pathology this pattern addresses
concepts/head-of-line-blocking — the general class
concepts/timeout · concepts/backpressure
systems/zalando-prapi