Skip to content

CONCEPT Cited by 3 sources

Elasticity

Elasticity is the property that a service's capacity and performance expand and contract to customer demand without requiring the customer to forecast, provision, or negotiate — and without imposing sharp edges (quotas, rate-cliffs, separate tiers) that the developer has to architect around.

It is the architectural lever behind most "developer experience" improvements on foundational cloud services: making elasticity real is what lets the developer stop thinking about the service.

Two dimensions (per S3's framing, 2025)

Andy Warfield separates elasticity into capacity and performance:

  • Capacity elasticity: "On S3, you never have to do up front provisioning of capacity or performance, and you don't worry about running out of space." No upfront sizing; no per-bucket capacity ceiling.
  • Performance elasticity: "Any customer should be entitled to use the entire performance capability of S3, as long as it didn't interfere with others." Implemented via (a) transparent docs on request shape; (b) that shape baked into systems/aws-crt; (c) latency-class unlock via systems/s3-express-one-zone.

Both dimensions become invisible when they work — and both have a failure mode: "when we have aspects of the system that require extra work from developers, the lack of simplicity is distracting and time consuming."

(Source: sources/2025-03-14-allthingsdistributed-s3-simplicity-is-table-stakes)

Elasticity on compute (Lambda framing)

The Lambda PR/FAQ's version: scale range includes zero (concepts/scale-to-zero) and the same code path handles "one application invocation per month and 1,000 per second." Elasticity for compute implies no warm capacity assumption, which in turn forces multi-tenant packing (see concepts/micro-vm-isolation) so capacity is actually recyclable at low utilisation.

(Source: sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years)

What erodes elasticity

  • Explicit quotas surfaced to the customer (S3's 100-buckets/account cap — later raised to up to 1M/account).
  • Provisioning steps (pre-configure throughput, pre-declare concurrency).
  • Tier cliffs where a workload exceeds one storage/latency class and must be migrated.
  • Performance gotchas in the client that require custom retry / parallelization code. Fix: move the shape into a library.

The S3 post treats each of those as a simplification debt — the feature wasn't simple enough at launch and is being paid down over time. See concepts/simplicity-vs-velocity.

Predictive vs. reactive realisations

The elasticity ideal — "capacity perfectly matches demand at every moment" — can be approached from two directions in auto-scaling systems:

  • Reactive: watch observed utilisation, act after it crosses a threshold. Always pays the detection-plus-action latency as observed tail.
  • Predictive: forecast future demand, act before the load arrives. Hides scaling latency from the observed tail when the forecast is correct.

MongoDB's 2026-04-07 framing of the "imaginary perfect auto-scaling algorithm" is an articulation of elasticity as the theoretical target — "anticipate each customer's needs and perfectly scale their servers up and down, according to their changing demands." The predictive mechanism is the engineering approximation; the reactive mechanism is the backstop when the forecast fails. Together they move elasticity from "eventually right-sized" toward "right-sized throughout the cycle" — including at cost level: "save our customers money and reduce our carbon emissions" (Source: sources/2026-04-07-mongodb-predictive-auto-scaling-an-experiment).

On the tier-based managed-database axis ( MongoDB Atlas M10/M20/.../M60 catalog), elasticity has a lower ceiling than on serverless substrates (discrete tiers, not continuous), but predictive scaling moves between tiers with less observed-latency cost and skips intermediate tiers on sharp demand shifts — tighter approximation inside a discrete-catalog abstraction.

Seen in

Last updated · 200 distilled / 1,178 read