CONCEPT Cited by 2 sources

Aggregate demand smoothing (multi-tenant scale)¶

Definition¶

Aggregate demand smoothing names the observation that, in a system shared by a very large number of independent tenants, the sum of bursty per-tenant workloads is much smoother than any individual workload. Past a certain aggregation threshold, no single tenant's burst can move the aggregate curve.

This is the law of large numbers applied to capacity planning, and it is why multi-tenant cloud storage can provide each individual tenant with burst performance that would be cost-prohibitive to provision stand-alone.

The S3 formulation (Warfield, 2025)¶

The more workloads we run on S3, the more that individual requests to objects become decorrelated with one another. Individual storage workloads tend to be really bursty — in fact, most storage workloads are completely idle most of the time and then experience sudden load peaks when data is accessed. That peak demand is much higher than the mean. But as we aggregate millions of workloads a really, really cool thing happens: the aggregate demand smooths and it becomes way more predictable. In fact — and I found this to be a really intuitive observation once I saw it at scale — once you aggregate to a certain scale you hit a point where it is difficult or impossible for any given workload to really influence the aggregate peak at all.

Why this matters¶

Placement stops being a prediction problem. At small scale, you must predict which workload will be hot to place it well. At large scale, aggregate demand is already smooth — the problem reduces to translating smooth aggregate into smooth per-drive load via placement (see concepts/heat-management).
Customer bursts are essentially free. If individual tenants can't move the aggregate, they can burst hard without causing service-wide capacity pressure.
The marketing claim "use all of S3" has teeth. A single Lambda-parallel genomics burst in the post serves from over a million individual disks. The aggregate-scale customer pool is what makes this economical.
Scale becomes a quality lever, not just a cost lever. The usual "scale → lower unit cost" framing undersells this: scale also flattens variance, which is itself a service-quality property.

Boundary conditions¶

Correlated workloads break it. If many tenants share a time-correlated trigger (e.g., everyone refreshes at the same NTP second), the "independent" assumption fails and a peak spikes through. Operationally, this surfaces as synchronized-request storms.
Single large tenants erode it. If one tenant becomes a large enough fraction of aggregate demand, it can move the aggregate peak. S3's multi-decade customer acquisition has kept aggregation broad enough that no one customer dominates.

Contrast with single-tenant IO prediction¶

Warfield contrasts this with his earlier research and Coho Data work:

Before joining Amazon, I spent time doing research and building systems that tried to predict and manage this I/O heat at much smaller scales — like local hard drives or enterprise storage arrays — and it was basically impossible to do a good job of.

At enterprise scale, you can't aggregate your way out. At S3 scale, you can.

Seen in¶

sources/2025-02-25-allthingsdistributed-building-and-operating-s3 — Warfield's observation; foundation under S3's concepts/heat-management model.
sources/2024-03-06-highscalability-behind-aws-s3s-massive-scale — Kozlovski calls the phenomenon "workload decorrelation" — "that's the phenomenon of seeing a smoothening of load once it's aggregated on a large enough scale." Same Warfield observation in different language; restates "most storage workloads remain completely idle for most of the time. They only experience a sudden load peak when the data is accessed, and that peak demand is much higher than the mean." Third-party summary of a first-party observation — useful when cross-citing.