CONCEPT Cited by 1 source
Multi-path datacenter transport¶
Definition¶
A class of non-TCP transport protocols specifically designed to exploit the multiple parallel paths available through modern datacenter fabrics, rather than pinning a flow to a single path (TCP's default on commodity networks). Documented in the High Scalability Dec-2022 roundup via three parallel disclosures: Homa, AWS SRD, and Google's Snap/Aquila/CliqueMap work.
Why TCP is wrong inside the datacenter¶
TCP's design trade-offs are tuned for the wide-area Internet:
- Unknown topology between endpoints → single-path flows, no multipath.
- Long RTTs → retry timeouts measured in seconds, not microseconds.
- Diverse middleboxes → conservative congestion control, defensive ordering.
- Byte-stream abstraction → head-of-line blocking when a single packet is lost.
Inside a datacenter none of those assumptions hold:
- Topology is known (the fabric is under one operator's control).
- RTTs are microseconds.
- There are no hostile middleboxes.
- Workloads are message-oriented (RPCs, blocks, frames), not byte streams.
Disclosed production systems¶
- AWS SRD — Scalable Reliable Datagram, shipping on Nitro. Reduces EBS tail latency via multipath + microsecond retries.
- Homa — research-stage message-oriented, receiver-driven, multipath transport from Ousterhout's group at Stanford.
- Google Snap — microkernel host-networking stack.
- Google Aquila — unified low-latency datacenter fabric.
- Google CliqueMap — RMA-based distributed cache bypassing kernel I/O.
The unifying concept¶
Average latency doesn't matter; tail latency does. In any system where an application waits on a single I/O before proceeding (distributed locks, sequential block I/O, synchronous RPC chains), the P99.9 latency determines throughput. Multipath transports directly attack the tail by giving each packet multiple independent delivery paths.
What this means for wiki readers¶
- When reading about a new "high-performance" datacenter storage or RPC system, ask: does it go through the kernel's TCP stack, or something else?
- AWS SRD on Nitro, Google Snap, Google Aquila, research Homa — these are all the same trend from three angles: move datacenter transport off TCP, and off the host CPU.
- The old "networks are fast, CPUs are faster, so don't bother" heuristic has flipped: at microsecond RTTs the OS TCP stack is the bottleneck, not the wire.