CONCEPT Cited by 1 source

Multi-path datacenter transport¶

Definition¶

A class of non-TCP transport protocols specifically designed to exploit the multiple parallel paths available through modern datacenter fabrics, rather than pinning a flow to a single path (TCP's default on commodity networks). Documented in the High Scalability Dec-2022 roundup via three parallel disclosures: Homa, AWS SRD, and Google's Snap/Aquila/CliqueMap work.

Why TCP is wrong inside the datacenter¶

TCP's design trade-offs are tuned for the wide-area Internet:

Unknown topology between endpoints → single-path flows, no multipath.
Long RTTs → retry timeouts measured in seconds, not microseconds.
Diverse middleboxes → conservative congestion control, defensive ordering.
Byte-stream abstraction → head-of-line blocking when a single packet is lost.

Inside a datacenter none of those assumptions hold:

Topology is known (the fabric is under one operator's control).
RTTs are microseconds.
There are no hostile middleboxes.
Workloads are message-oriented (RPCs, blocks, frames), not byte streams.

Disclosed production systems¶

AWS SRD — Scalable Reliable Datagram, shipping on Nitro. Reduces EBS tail latency via multipath + microsecond retries.
Homa — research-stage message-oriented, receiver-driven, multipath transport from Ousterhout's group at Stanford.
Google Snap — microkernel host-networking stack.
Google Aquila — unified low-latency datacenter fabric.
Google CliqueMap — RMA-based distributed cache bypassing kernel I/O.

The unifying concept¶

Average latency doesn't matter; tail latency does. In any system where an application waits on a single I/O before proceeding (distributed locks, sequential block I/O, synchronous RPC chains), the P99.9 latency determines throughput. Multipath transports directly attack the tail by giving each packet multiple independent delivery paths.

What this means for wiki readers¶

When reading about a new "high-performance" datacenter storage or RPC system, ask: does it go through the kernel's TCP stack, or something else?
AWS SRD on Nitro, Google Snap, Google Aquila, research Homa — these are all the same trend from three angles: move datacenter transport off TCP, and off the host CPU.
The old "networks are fast, CPUs are faster, so don't bother" heuristic has flipped: at microsecond RTTs the OS TCP stack is the bottleneck, not the wire.

Seen in¶

sources/2022-12-02-highscalability-stuff-the-internet-says-on-scalability-for-december-2nd-2022