Skip to content

CONCEPT Cited by 1 source

Per-request billing

Definition

A cloud billing model where the unit of charge is the individual request / invocation, rather than provisioned capacity (vCPU-hours, GB-hours of RAM). The cost floor at zero traffic is zero (modulo minor storage / metadata fees). AWS Lambda is the canonical exemplar: "$1 per 6M invocations" (Brian LeRoux, via sources/2022-07-11-highscalability-stuff-the-internet-says-on-scalability-for-july-11th-2022).

Why it dominates provisioned-capacity billing for bursty

workloads

A traditional cluster sized for peak is idle most of the time: - A 10-server cluster running a 2-RPS service is ~98% over-provisioned (thdxr: "if the average req takes 20ms to complete, you're only using 1.2s of that minute"). - You still pay for the 8 idle servers 24/7.

Per-request billing converts the over-provisioning tax into zero: you pay only for the 1.2 seconds of work done in each minute. thdxr reports $4,000/mo EKS → $70/mo Lambda for an identical workload — ~57x cost reduction. Joe Emison's Branch serverless stack: $10K/mo total for a company that grew 15x YoY twice, including per-developer staging AWS accounts.

Counterexample: always-on workloads

If a service is genuinely running 24/7 at high utilization, per-request billing stops being cheaper — at high enough sustained RPS, a provisioned cluster wins on unit economics. AWS Lambda at constant 1000 RPS is roughly a break-even with a right-sized EC2. Chris Munns's observation (Source: sources/2022-07-11-highscalability-stuff-the-internet-says-on-scalability-for-july-11th-2022): "99% of apps never break 1000 rps" — which explains why per-request billing is cost-winning for most real apps.

Last updated · 517 distilled / 1,221 read