CONCEPT Cited by 3 sources
Scale to Zero¶
Scale-to-zero is a service-design property in which an application consumes no capacity and accrues no charge when it has no traffic, yet can serve the next request without customer-initiated capacity actions.
Design commitment¶
Lambda elevated scale-to-zero from an implementation detail to a stated tenet in its 2014 PR/FAQ: "Our service will scale customer applications without changes to their code or configuration. We will architect such that one application invocation per month and 1,000 per second are both well supported."
Operationally: "Lambda imposes no warm-up or cool-down periods or charges… Infrequent or periodic jobs are cost effective, sharing capacity with other users and only charging for actual execution time."
(Source: sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years)
What this requires the provider to own¶
- Multi-tenant placement. Idle customers can't keep hosts warm; the provider has to pack active work onto shared capacity. See systems/firecracker for Lambda's isolation primitive.
- Fast cold start. The wait on the first request is the price of zero idle cost. See concepts/cold-start and SnapStart for how Lambda has attacked this over the years.
- Fine-grained billing granularity. Without per-ms billing, a "scales to zero" service would still charge customers for partial execution quanta. See concepts/fine-grained-billing.
- Placement engine that compacts per-account workloads. From the PR/FAQ: "Each new request is placed with respect to minimizing the number of instances dedicated to that account… Spiky workloads, heterogeneous workloads, and short-lived jobs such as cron or batch applications all use capacity efficiently."
Related trade-off¶
Scale-to-zero is the direct counterpart to provisioned / always-warm compute. The two are often offered together (Lambda Provisioned Concurrency, container always-on, etc.) so users can choose per-workload: zero cold starts vs. zero idle cost.
Seen in¶
- sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years — tenet-level scale-to-zero commitment in the Lambda PR/FAQ.
- sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git — extended to versioned-storage-at-agent-scale: Artifacts repos cost $0.50/GB-month storage + $0.15/1 k ops with no hot/cold tier and no per-repo idle fee. "Unused (or rarely used) repos shouldn't be a drag" — a hibernated DO pays only for stored bytes. Sibling to Lambda scale-to-zero at the storage tier rather than the compute tier. See systems/cloudflare-artifacts.
- sources/2024-09-24-flyio-ai-gpu-clusters-from-your-laptop-with-livebook — FLAME-driven GPU cluster on Fly Machines, terminated on Livebook disconnect: per-node idle shutdown + whole-cluster tear-down give pure scale-to-zero economics over ephemeral GPU capacity. Requires seconds-scale cluster boot to make the re-up cost trivial.
- sources/2024-05-09-flyio-picture-this-open-source-ai-for-image-description
— Proxy-managed single-Machine GPU scale-to-zero on Fly.io
(sibling shape to the Livebook/FLAME cluster scale-to-zero
above). A single GPU Fly Machine running Ollama + LLaVA sits
behind Flycast; Fly
Proxy autostart/autostop stops the Machine after ~minutes of
idle silence and starts it on the next internal request from
the PocketBase app tier — canonical
patterns/proxy-autostop-for-gpu-cost-control. The tail is
tagged here with a concrete number:
~45 seconds on
a100-40gb+ LLaVA-34b — the explicit cost of pure GPU scale-to-zero at the per-request-autostart tier. - sources/2026-04-01-cloudflare-emdash-wordpress-spiritual-successor — CMS-as-serverless-workload. Cloudflare's EmDash post frames scale-to-zero as the counter-primitive to WordPress hosting: WordPress requires provisioning + managing + scaling servers, "there's no avoiding the need to pre-provision instances and run some amount of idle compute, or share resources in ways that limit performance." EmDash is built for V8 isolates on workerd; "the Workers runtime instantly spins up an isolate to execute code and serve a response. It scales back down to zero if there are no requests. And it only bills for CPU time." Per-request isolate = per-request scale-to-zero at the CMS tier. Host millions of EmDash sites via Cloudflare for Platforms without per-site idle compute cost.