Skip to content

PATTERN Cited by 1 source

Hedged reads for tail latency

Pattern

Issue redundant read requests to multiple storage nodes; use the first response and discard the rest, mitigating single-node latency outliers (laggards).

Problem

In distributed storage, one slow storage node contributes disproportionately to tail latencies. For AI workloads where bounded pMax is the SLO (any outlier stalls all synchronized GPUs), a single laggard node is unacceptable.

Solution

On the client side, after a configurable timeout (or proactively for critical reads), send the same read request to an alternative replica/shard. Use whichever response arrives first.

Trade-off

  • Benefit: dramatically reduces tail latency without changing the storage layer
  • Cost: increases total read I/O by the hedging rate (typically small — only triggered for outlier reads)

Relation to "The Tail at Scale"

This is the same hedged-request technique described by Jeff Dean and Luiz André Barroso in "The Tail at Scale" (2013). Meta applies it at the BLOB-storage SDK level to mitigate storage-node laggards stalling GPU training.

(Source: sources/2026-07-01-meta-ai-storage-blueprint-at-scale, "Protocol Optimizations" section)

Seen in

Last updated · 567 distilled / 1,685 read