PATTERN Cited by 1 source
Hedged reads for tail latency¶
Pattern¶
Issue redundant read requests to multiple storage nodes; use the first response and discard the rest, mitigating single-node latency outliers (laggards).
Problem¶
In distributed storage, one slow storage node contributes disproportionately to tail latencies. For AI workloads where bounded pMax is the SLO (any outlier stalls all synchronized GPUs), a single laggard node is unacceptable.
Solution¶
On the client side, after a configurable timeout (or proactively for critical reads), send the same read request to an alternative replica/shard. Use whichever response arrives first.
Trade-off¶
- Benefit: dramatically reduces tail latency without changing the storage layer
- Cost: increases total read I/O by the hedging rate (typically small — only triggered for outlier reads)
Relation to "The Tail at Scale"¶
This is the same hedged-request technique described by Jeff Dean and Luiz André Barroso in "The Tail at Scale" (2013). Meta applies it at the BLOB-storage SDK level to mitigate storage-node laggards stalling GPU training.
(Source: sources/2026-07-01-meta-ai-storage-blueprint-at-scale, "Protocol Optimizations" section)
Seen in¶
- sources/2026-07-01-meta-ai-storage-blueprint-at-scale — Meta BLOB storage tail-latency mitigation