Skip to content

CONCEPT Cited by 1 source

Multi-path datacenter transport

Definition

A class of non-TCP transport protocols specifically designed to exploit the multiple parallel paths available through modern datacenter fabrics, rather than pinning a flow to a single path (TCP's default on commodity networks). Documented in the High Scalability Dec-2022 roundup via three parallel disclosures: Homa, AWS SRD, and Google's Snap/Aquila/CliqueMap work.

Why TCP is wrong inside the datacenter

TCP's design trade-offs are tuned for the wide-area Internet:

  • Unknown topology between endpoints → single-path flows, no multipath.
  • Long RTTs → retry timeouts measured in seconds, not microseconds.
  • Diverse middleboxes → conservative congestion control, defensive ordering.
  • Byte-stream abstraction → head-of-line blocking when a single packet is lost.

Inside a datacenter none of those assumptions hold:

  • Topology is known (the fabric is under one operator's control).
  • RTTs are microseconds.
  • There are no hostile middleboxes.
  • Workloads are message-oriented (RPCs, blocks, frames), not byte streams.

Disclosed production systems

  • AWS SRD — Scalable Reliable Datagram, shipping on Nitro. Reduces EBS tail latency via multipath + microsecond retries.
  • Homa — research-stage message-oriented, receiver-driven, multipath transport from Ousterhout's group at Stanford.
  • Google Snap — microkernel host-networking stack.
  • Google Aquila — unified low-latency datacenter fabric.
  • Google CliqueMap — RMA-based distributed cache bypassing kernel I/O.

The unifying concept

Average latency doesn't matter; tail latency does. In any system where an application waits on a single I/O before proceeding (distributed locks, sequential block I/O, synchronous RPC chains), the P99.9 latency determines throughput. Multipath transports directly attack the tail by giving each packet multiple independent delivery paths.

What this means for wiki readers

  • When reading about a new "high-performance" datacenter storage or RPC system, ask: does it go through the kernel's TCP stack, or something else?
  • AWS SRD on Nitro, Google Snap, Google Aquila, research Homa — these are all the same trend from three angles: move datacenter transport off TCP, and off the host CPU.
  • The old "networks are fast, CPUs are faster, so don't bother" heuristic has flipped: at microsecond RTTs the OS TCP stack is the bottleneck, not the wire.

Seen in

Last updated · 319 distilled / 1,201 read