CONCEPT Cited by 3 sources
Cold Start¶
Cold start is the extra latency a serverless / scale-to-zero service incurs when it has to allocate execution capacity for a request whose target has no warm instance — VM boot, runtime init, user-code init.
Why it exists¶
Cold start is the direct counterpart of scale-to-zero: if an application consumes no idle capacity when unused, the first request after a quiet period has to pay the bring-up cost. See concepts/scale-to-zero.
How Lambda framed it, day one¶
The 2014 PR/FAQ was explicit about the shape of the latency curve: "Applications in steady use have typical latencies in the range of 20-50ms, determined by timing a simple 'echo' application from a client hosted in Amazon EC2. Latency will be higher the first time an application is deployed and when an application has not been used recently." The team also committed to an internal measurement of "latency of process invocation to execution of customer code" as a dimension to optimise.
(Source: sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years)
10-year evolution of the attack surface¶
- Single-tenant EC2 instances (launch) — slow to spin up, expensive to keep warm.
- Firecracker micro-VMs — startup in milliseconds; enables dense packing so warm capacity is cheap to keep around.
- Container image support (2020) — up to 10 GB images; solved via on-demand block-level image loading (Marc Brooker, USENIX ATC '23) so pulling a 10 GB image isn't a 10 GB cold start.
- SnapStart (2022) — pre-initialized Firecracker VM snapshots, restored on demand; "reduced cold start latency — especially for Java functions — by up to 90%."
What "fast cold start" actually means mechanically¶
The levers are consistent across providers:
- Smaller / snapshotted isolation units (micro-VMs, not full OS boots).
- Lazy / on-demand loading of code and dependencies.
- Aggressive caching of the runtime + user init past the first request.
- Pre-warmed capacity pools (Lambda Provisioned Concurrency, container always-on minimums) — trades idle cost back for zero cold-start.
Seen in¶
- sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years — cold start flagged from the day-one PR/FAQ; the 10-year annotations walk through how Lambda has chipped at it (Firecracker → on-demand container loading → SnapStart).
- sources/2025-10-14-cloudflare-unpacking-cloudflare-workers-cpu-performance-benchmarks — Cloudflare Workers' V8-isolate cold-start mitigation via warm-isolate routing: the heuristic that routes to warm isolates was tuned for I/O-bound workloads; under CPU-bound bursts the resulting queueing looked like slow CPU. 2025-10 fix biases CPU-sustain detection and spins up new isolates faster — keeps I/O-bound coalescing property.
- sources/2024-05-09-flyio-picture-this-open-source-ai-for-image-description
— GPU-inference cold-start tail with a real number. On
Fly.io's
a100-40gbpreset with LLaVA-34b, cold start from a fully-stopped Machine is ~45 seconds decomposed as seconds of Machine boot + tens of seconds to load weights into GPU RAM - seconds for the first response. Different dominant stage from CPU/serverless cold starts (where runtime init dominates) — on GPU inference, model-load-into-GPU-RAM dominates. See concepts/gpu-scale-to-zero-cold-start for the GPU-specific three-stage framing.