Skip to content

PATTERN Cited by 2 sources

Multipart upload / ranged-GET parallelism

Pattern

To maximize throughput against a massively-parallel object store, push parallelism to the client:

  1. Many clients × many connections × many endpoints. A single client / connection / endpoint cannot saturate a distributed backend's aggregate throughput, no matter how fat the pipe. Any single cache/LB/frontend it lands on becomes the bottleneck.
  2. Parallelism within a single operation:
  3. PUT: multipart upload. Split the object into parts, upload each part on a separate connection in parallel, issue a CompleteMultipartUpload that stitches them together server-side.
  4. GET: HTTP Range header. Split the read into byte ranges, fetch each range on a separate connection in parallel, reassemble client-side.

The aggregate throughput of the result is roughly n× a single-connection GET/PUT, up to the client's network budget and the storage backend's per-tenant concurrency ceiling.

Why it works

The storage backend is composed of many frontends, many connections, many drives (see patterns/data-placement-spreading). A single sequential request visits at most a small set of these resources — the others sit idle. Parallelising the request itself turns the backend's spread-placement into throughput available to the client.

From Kozlovski (2024):

"Instead of requesting all the files through one client with one connection to one S3 endpoint, users are encouraged to create as many powers as possible with as many parallel connections as possible. This utilizes many different endpoints of the distributed system, ensuring no single point in the infrastructure becomes too hot (e.g caches)."

From Warfield (2025) on S3:

"Any customer should be entitled to use the entire performance capability of S3, as long as it didn't interfere with others" — but only the customer that parallelises per the best-practice guidance gets that throughput. GPU instances driving "hundreds of gigabits per second in and out of S3" all parallelise at the request level.

Why this works on S3 in particular

S3's design pairs with this pattern tightly:

  • Spread placement means a single bucket's objects live on many drives — so a multi-connection burst hits many drives rather than one.
  • Aggregate demand smoothing (concepts/aggregate-demand-smoothing) means one customer's burst is a vanishing fraction of per-drive load, so the customer can burst hard without harming others.
  • Many frontend endpoints behind the S3 hostname ensure multiple connections don't terminate on one box.

The library version: AWS Common Runtime

Rather than leaving this discipline to every client implementation, AWS baked it into the Common Runtime (CRT) library. The CRT implements the multipart / ranged-GET / parallel-connection / backoff-and-retry strategy once, and exposes it to every SDK (Python, Java, Go, Rust, C++, …). Applications link CRT and get S3's best-practice throughput shape automatically — no bespoke client code.

This is an instance of: turn a best-practice into a library, not documentation.

Generalization

The pattern isn't S3-specific — it applies to any distributed storage system whose aggregate throughput exceeds what any one connection can carry.

Store Multipart analog Range analog
AWS S3 / GCS / Azure Blob Multipart upload HTTP Range header
Ceph RADOS Stripe + scatter RADOS read offset/length
Cassandra / DynamoDB Batch write spread Parallel scan segments
SQLite on object store (Litestream VFS) (write-side n/a) patterns/vfs-range-get-from-object-store

Byte-range GET as a primitive — not only a parallelism technique — shows up in e.g. the Litestream VFS on S3 (see sources/2025-12-11-flyio-litestream-vfs). There, each SQLite page read resolves to a byte-range GET against an LTX file in S3; the same HTTP verb that powers client-side parallelism for big objects also powers page-granular reads for small objects. The range primitive carries two very different workloads.

Caveats

  • Small objects pay overhead. A 50KB multipart upload is slower than a simple PUT because of the multipart roundtrip tax. Use single-PUT below a threshold (AWS guidance: roughly 100MB as the break-even).
  • Part size has a minimum. S3 requires ≥5MB parts (except the last) — clients that split too aggressively get rejected.
  • You need to handle partial failures. If one part fails, the client must retry that part; CompleteMultipartUpload only succeeds once all parts are uploaded.
  • Connection count has diminishing returns. More than ~50 parallel connections per client typically stops helping — the client's own CPU and TCP stack become the bottleneck.
  • The pattern is only correct with a spread-placed backend. If the backend places all of your data on one drive, 16 parallel GETs just queue 16 reads on the same drive.

Seen in

Last updated · 319 distilled / 1,201 read