Skip to content

CONCEPT Cited by 8 sources

Scale to Zero

Scale-to-zero is a service-design property in which an application consumes no capacity and accrues no charge when it has no traffic, yet can serve the next request without customer-initiated capacity actions.

Design commitment

Lambda elevated scale-to-zero from an implementation detail to a stated tenet in its 2014 PR/FAQ: "Our service will scale customer applications without changes to their code or configuration. We will architect such that one application invocation per month and 1,000 per second are both well supported."

Operationally: "Lambda imposes no warm-up or cool-down periods or charges… Infrequent or periodic jobs are cost effective, sharing capacity with other users and only charging for actual execution time."

(Source: sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years)

What this requires the provider to own

  • Multi-tenant placement. Idle customers can't keep hosts warm; the provider has to pack active work onto shared capacity. See systems/firecracker for Lambda's isolation primitive.
  • Fast cold start. The wait on the first request is the price of zero idle cost. See concepts/cold-start and SnapStart for how Lambda has attacked this over the years.
  • Fine-grained billing granularity. Without per-ms billing, a "scales to zero" service would still charge customers for partial execution quanta. See concepts/fine-grained-billing.
  • Placement engine that compacts per-account workloads. From the PR/FAQ: "Each new request is placed with respect to minimizing the number of instances dedicated to that account… Spiky workloads, heterogeneous workloads, and short-lived jobs such as cron or batch applications all use capacity efficiently."

Scale-to-zero is the direct counterpart to provisioned / always-warm compute. The two are often offered together (Lambda Provisioned Concurrency, container always-on, etc.) so users can choose per-workload: zero cold starts vs. zero idle cost.

Seen in

  • sources/2026-05-20-databricks-marketing-campaigns-with-lakebaseScale-to-zero + sub-second scale-down as bursty-OLTP enabler. Concrete production sizing disclosed for a Deichmann marketing-campaign workload on Lakebase Autoscaling: scale-to-0 minimum, 16 CU (~32 GB RAM) maximum. The load-bearing architectural justification: "Lakebase autoscaling speed and reactivity eliminate the risk of resource underutilization, which lowers TCO." The earlier Lakebase-disclosed property "scales down when idle in less than a second" is what makes generous max-cap sizing safe here — the underutilisation risk is bounded by autoscaling reactivity, not by the gap between min and max. Generalises the scale-to-zero shape from agent-provisioned databases (Stripe Projects framing) and durable-execution workloads (Lambda framing) to operational-OLTP read-bursts driven by scheduled business events (campaigns, batch triggers, business-hour traffic). The reframe: storage-compute separation is what enables OLTP scale-to-zero — without it, scaling compute to zero would mean losing the page cache; with it, the persistent state lives in Pageserver and the ephemeral compute layer can come and go. Composes with concepts/bursty-query-pattern (the workload shape) and concepts/compute-storage-separation (the architectural precondition).

  • sources/2026-04-29-databricks-and-stripe-projects-infrastructure-built-for-agentsScale-to-zero as agent-lifecycle enabler. First canonical wiki articulation of scale-to-zero framed as the economic property that makes agent-provisioned databases viable — not just operationally tolerable, but economically first-class. Verbatim: "Traditional databases require manual provisioning and 'always-on' costs. Lakebase compute resources dynamically adjust to match traffic spikes in real-time and automatically scale to zero when idle. For agents, this means they can spin up production-ready environments without worrying about capacity planning or wasted spend." Pairs with the disclosed <350 ms provisioning-time number as the sub-second-up side of the lifecycle envelope. Extends the concept's axis list: scale-to-zero is now canonicalised as (1) Lambda idle-cost elimination at the request tier, (2) Fly Sprites durable-substrate at the per-user tier, (3) Cloudflare Artifacts storage-tier scale-to-zero, (4) LangGuard bursty-governance-workload OLTP tier, (5) agent- speculative-provisioning economic enabler. The fifth axis is structurally new because the workload shape is speculatively-created-then-abandoned-in-bulk — agents create branches for tests / migrations / prompt experiments and discard most of them — which only avoids crushing operational cost liability if idle databases cost nothing. See concepts/agent-provisioned-database for the resource-tier primitive this economic property underpins.

  • sources/2026-04-27-databricks-inside-one-of-the-first-production-deployments-of-lakebase-langguardBursty-agentic-workload → serverless-OLTP fit. First canonical wiki articulation of scale-to-zero specifically as the answer to the bursty security-telemetry workload shape that the LangGuard team (former IBM QRadar SIEM engineers) previously had to provision for at peak in a coupled-compute/ storage Postgres world. Verbatim framing: "Traditional databases that couple compute and storage force you to provision for peak load and pay for that capacity around the clock. Lakebase's serverless model, which fully decouples compute from storage and scales to zero between bursts, was the answer we had always needed but didn't have access to when we were building QRadar." Lakebase's specific payoff: "spinning up a new compute instance requires no data movement. It simply attaches to the existing database history and begins serving queries immediately" — because durable state lives in the Pageserver/Safekeeper storage tier, scale-up from zero doesn't introduce a data-warm-up cost on top of the compute-VM cold start. See systems/lakebase + systems/langguard.

  • sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years — tenet-level scale-to-zero commitment in the Lambda PR/FAQ.
  • [[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — economic precondition for keeping dozens of durable VMs alive per user. Sprites' "use dozens, casually create hundreds" posture only works because idle Sprites auto-sleep and stop metering. The implementation piece: the object-store-rooted disk removes the attached-storage-anchoring cost that forced Fly Volumes to keep a worker association even while the VM was stopped. Scale-to-zero of durable-by-default VMs is only cheap because storage isn't pinned to a physical. Billing aligns: "Sprites bill only for what you actually use (in particular: only for storage blocks you actually write, not the full 100GB capacity)."
  • sources/2026-04-16-cloudflare-artifacts-versioned-storage-that-speaks-git — extended to versioned-storage-at-agent-scale: Artifacts repos cost $0.50/GB-month storage + $0.15/1 k ops with no hot/cold tier and no per-repo idle fee. "Unused (or rarely used) repos shouldn't be a drag" — a hibernated DO pays only for stored bytes. Sibling to Lambda scale-to-zero at the storage tier rather than the compute tier. See systems/cloudflare-artifacts.
  • sources/2024-09-24-flyio-ai-gpu-clusters-from-your-laptop-with-livebookFLAME-driven GPU cluster on Fly Machines, terminated on Livebook disconnect: per-node idle shutdown + whole-cluster tear-down give pure scale-to-zero economics over ephemeral GPU capacity. Requires seconds-scale cluster boot to make the re-up cost trivial.
  • sources/2024-05-09-flyio-picture-this-open-source-ai-for-image-descriptionProxy-managed single-Machine GPU scale-to-zero on Fly.io (sibling shape to the Livebook/FLAME cluster scale-to-zero above). A single GPU Fly Machine running Ollama + LLaVA sits behind Flycast; Fly Proxy autostart/autostop stops the Machine after ~minutes of idle silence and starts it on the next internal request from the PocketBase app tier — canonical patterns/proxy-autostop-for-gpu-cost-control. The tail is tagged here with a concrete number: ~45 seconds on a100-40gb + LLaVA-34b — the explicit cost of pure GPU scale-to-zero at the per-request-autostart tier.
  • sources/2026-04-01-cloudflare-emdash-wordpress-spiritual-successorCMS-as-serverless-workload. Cloudflare's EmDash post frames scale-to-zero as the counter-primitive to WordPress hosting: WordPress requires provisioning + managing + scaling servers, "there's no avoiding the need to pre-provision instances and run some amount of idle compute, or share resources in ways that limit performance." EmDash is built for V8 isolates on workerd; "the Workers runtime instantly spins up an isolate to execute code and serve a response. It scales back down to zero if there are no requests. And it only bills for CPU time." Per-request isolate = per-request scale-to-zero at the CMS tier. Host millions of EmDash sites via Cloudflare for Platforms without per-site idle compute cost.
  • sources/2026-01-09-flyio-code-and-let-livescale-to-zero applied to a durable substrate. Fly.io Sprites "go idle and stop metering automatically" — but the VM is not de-allocated, just de-metered. On next access the Sprite's disk state is still there, ffmpeg still installed, checkpoint history intact. This is the first wiki scale-to-zero instance on a persistent-per-user substrate (vs. Lambda's per-invocation ephemerality and Fly-Proxy-autostop's per-session Fly Machine shape where "stop" means "destroy the ephemeral VM"). Enables the "I use dozens" / "casually create hundreds of them" user-side economics Ptacek describes — durable-VM-at-rest has to cost ~0 for users to keep many alive.
Last updated · 542 distilled / 1,571 read