Skip to content

CONCEPT Cited by 1 source

Fine-Grained Billing

Fine-grained billing is the practice of charging customers at a very small unit of actual consumption (milliseconds of CPU, bytes of memory × time) rather than at the granularity of a provisioned unit (per hour, per reserved instance).

Why it's an architectural commitment, not a pricing choice

In the 2014 Lambda PR/FAQ, fine-grained duration-based pricing was tenet #4: "Our service will target fine-grained pay-for-use; developers will not pay for idle time. We will own the problem of application placement so that developers never experience waste through underutilized hosts. We will seek to minimize both costs and billing granularity."

Key alignment claim: "Our pricing approach ensures that customers cannot overprovision or underutilize by design: customers utilize 100% of the computing power they're paying for when they run an application."

(Source: sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years)

Lambda's granularity evolution

Era Granularity
2014 PR/FAQ 250ms
Nov 2014 launch 100ms
Today (2024) 1ms, no minimum

Billing granularity is a customer-alignment lever, tightened over time — each shrink incentivises further customer-side optimisation because saved milliseconds now translate 1:1 into saved dollars.

What it forces the provider to build

  • Placement compaction. Per-account scheduling that minimises the instance count used for that account's workload; spiky and heterogeneous jobs can be packed without user effort (see concepts/scale-to-zero).
  • Multi-tenant isolation primitive cheap enough that the provider doesn't lose money running idle containers for sparse workloads. See concepts/micro-vm-isolation.
  • Precise metering infrastructure. Per-ms CPU + per-MB memory × duration for potentially billions of invocations per day.

Trade-off

Very fine granularity shifts the incentive to reduce code-path latency to the customer: "Developers can lower costs and improve performance by minimizing startup and shutdown overhead, ensuring that each time an application is invoked it immediately begins doing useful work." The provider, in exchange, has to keep cold-start latency low enough that this game is winnable.

Seen in

Last updated · 200 distilled / 1,178 read