Skip to content

CONCEPT Cited by 10 sources

Cold Start

Cold start names two distinct phenomena that share a name because both describe "the first time around is slow / hard":

  1. Serverless / scale-to-zero cold start — the extra latency a serverless or scale-to-zero service incurs when a request has no warm instance to land on. VM boot + runtime init + user-code init. The dominant meaning in this wiki's serverless-infra posts.
  2. Recsys cold start — the quality problem of serving recommendations for newly-launched content, newly-joined users, or newly-onboarded distributions (a new partner site, a new region, a new tenant) when the recommender has no user-interaction data yet. A modelling / training-data problem, not a latency problem.

The two live on the same page here because they share the word "cold" with reversed polarity: serverless cold start is a system-capacity concept (insufficient warm instances); recsys cold start is a signal-availability concept (insufficient engagement data).

Why it exists

Cold start is the direct counterpart of scale-to-zero: if an application consumes no idle capacity when unused, the first request after a quiet period has to pay the bring-up cost. See concepts/scale-to-zero.

How Lambda framed it, day one

The 2014 PR/FAQ was explicit about the shape of the latency curve: "Applications in steady use have typical latencies in the range of 20-50ms, determined by timing a simple 'echo' application from a client hosted in Amazon EC2. Latency will be higher the first time an application is deployed and when an application has not been used recently." The team also committed to an internal measurement of "latency of process invocation to execution of customer code" as a dimension to optimise.

(Source: sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years)

10-year evolution of the attack surface

  • Single-tenant EC2 instances (launch) — slow to spin up, expensive to keep warm.
  • Firecracker micro-VMs — startup in milliseconds; enables dense packing so warm capacity is cheap to keep around.
  • Container image support (2020) — up to 10 GB images; solved via on-demand block-level image loading (Marc Brooker, USENIX ATC '23) so pulling a 10 GB image isn't a 10 GB cold start.
  • SnapStart (2022) — pre-initialized Firecracker VM snapshots, restored on demand; "reduced cold start latency — especially for Java functions — by up to 90%."

What "fast cold start" actually means mechanically

The levers are consistent across providers:

  • Smaller / snapshotted isolation units (micro-VMs, not full OS boots).
  • Lazy / on-demand loading of code and dependencies.
  • Aggressive caching of the runtime + user init past the first request.
  • Pre-warmed capacity pools (Lambda Provisioned Concurrency, container always-on minimums) — trades idle cost back for zero cold-start.

Recsys cold start (the other meaning)

In recommendation systems, cold start is the problem of making useful recommendations for entities with insufficient interaction data. Three distinct sub-cases:

  • New item cold start — a newly-launched title / product / listing has no clicks, plays, purchases, or watch-history yet. Collaborative-filtering models built on user-item interaction matrices can't place the new item because it has no row of signal.
  • New user cold start — a newly-joined user has no watch / purchase history. The model can't personalise because it has no history to personalise from.
  • New domain / new tenant / new partner cold start — a whole distribution arrives with insufficient interaction history: a new partner site onboarded onto a multi-tenant ML serving platform, a new region, a new product surface. The model can't converge on the new distribution because it has no distribution-specific data, while a model trained on a related data-rich domain misses the new distribution's nuances. This is the Domain Adaptive Learning regime; see Carrot Ads for the canonical instance.

The dominant mitigation for new-item cold start is content-derived representation: learn an embedding for the item itself from its content (text, image, audio, video), not from user interactions. A brand-new title has a usable embedding from the moment its content is available, even with zero user interactions. User cold-start gets mirrored mitigations (profile questions, demographic priors, contextual bandits).

The dominant mitigation for new-domain / new-partner cold start is transfer learning — specifically domain-adaptive learning — where pre-trained embeddings + dense representations from a data-rich source domain are reused as a warm-start for the new target domain, with target-specific fine-tuning on the limited data the target does have. See patterns/cross-domain-warm-start-via-shared-embeddings for the canonical pattern.

Canonical wiki instances of recsys cold start.

  • New-item cold start. Netflix's MediaFM (2026-02-23) is a tri-modal (audio + video + text) foundation model producing contextual shot-level embeddings for every title in the catalog. The blog explicitly names "effective cold start of newly launching titles in recommendations" as one of the primary motivations for the model — content-derived embeddings mean a new season / film has a representation for the recsys pipeline on launch day, with no user-engagement data required. (Source: sources/2026-02-23-netflix-mediafm-the-multimodal-ai-foundation-for-media-understanding)

  • New-domain / new-partner cold start. Instacart's Carrot Ads (2026-05-04) applies Domain Adaptive Learning to bootstrap a wide-and-deep pCTR model for each newly-onboarded retailer partner site, treating Instacart Marketplace as the source domain and the partner site as the target. Quote: "onboarding a new partner onto Carrot Ads introduces a key challenge: the 'cold start' problem, where limited historical interactions make it difficult to predict user behavior accurately... we developed a Domain Adaptive Learning approach that transfers knowledge from Instacart's data-rich environment to new partner environments." Counter-intuitive property: DAL outperforms from-scratch training even when the target partner has enough data, because the source-domain first-party data contributes signal the target lacks. Distinct from new-item / new-user cold start because a whole distribution is being bootstrapped, not an individual entity within an existing distribution. (Source: sources/2026-05-04-instacart-empowering-carrot-ads-with-domain-adaptive-learning)

Adjacent wiki framings of these underlying ideas:

  • Content-derived ranking features for new items: an item's intrinsic attributes (text, image, genre, tags) provide enough signal for a ranker to place it before collaborative signal accumulates.
  • Two-tower retrieval with a content-tower for items + a user-history-tower for users: if the item tower is trained on content, new items get embeddings without requiring interaction data for them.
  • Knowledge distillation / teacher-student models — a strong content-derived teacher can label new items for a faster student recsys model.
  • Shared pre-trained embeddings + per-target fine-tune — for new-domain cold start, pre-train shared layers on the source domain and fine-tune per target. See patterns/cross-domain-warm-start-via-shared-embeddings.

The two "cold start" meanings don't interact in practice: serverless cold start is addressed at infra + request-handling layers; recsys cold start is addressed at representation-learning + feature-engineering layers. No shared mitigation technique applies to both.

Seen in

  • sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years — cold start flagged from the day-one PR/FAQ; the 10-year annotations walk through how Lambda has chipped at it (Firecracker → on-demand container loading → SnapStart).
  • sources/2025-10-14-cloudflare-unpacking-cloudflare-workers-cpu-performance-benchmarks — Cloudflare Workers' V8-isolate cold-start mitigation via warm-isolate routing: the heuristic that routes to warm isolates was tuned for I/O-bound workloads; under CPU-bound bursts the resulting queueing looked like slow CPU. 2025-10 fix biases CPU-sustain detection and spins up new isolates faster — keeps I/O-bound coalescing property.
  • sources/2024-05-09-flyio-picture-this-open-source-ai-for-image-descriptionGPU-inference cold-start tail with a real number. On Fly.io's a100-40gb preset with LLaVA-34b, cold start from a fully-stopped Machine is ~45 seconds decomposed as seconds of Machine boot + tens of seconds to load weights into GPU RAM
  • seconds for the first response. Different dominant stage from CPU/serverless cold starts (where runtime init dominates) — on GPU inference, model-load-into-GPU-RAM dominates. See concepts/gpu-scale-to-zero-cold-start for the GPU-specific three-stage framing.

  • sources/2026-04-22-allthingsdistributed-invisible-engineering-behind-lambdas-networkcanonical decomposition of Lambda's VPC-mode cold start and the decade-long campaign to compress each stage. Overall arc: Firecracker migration (2019) cut Lambda cold-start overhead from >10 s → <1 s; VPC-mode functions still paid ~300 ms on top of that for Geneve tunnel setup + DHCP. The 2026-04-22 post discloses the Geneve portion compressed to 200 μs (150 ms → 200 μs, ~750×) via an eBPF-based header-rewrite trick (patterns/ebpf-header-rewrite-on-egress) — pre-create tunnels with dummy VNIs, rewrite to the real VNI once function init provides it, reverse on ingress. DHCP remains open, a multi-phase effort the team is currently working through. The latency win opened architectural headroom: it relaxed the density constraint that blocked 4,000-slot packing per worker, reduced CPU heat during cold-start bursts, and improved cross-AZ evacuation behaviour — latency optimization as side-effect lever. Names the network-setup portion of cold start as a first-class optimization target: the CPU spent on Geneve + iptables + NAT during cold-start bursts wasn't just per-invocation latency but a platform absorb-capacity constraint.

  • sources/2026-02-23-netflix-mediafm-the-multimodal-ai-foundation-for-media-understandingrecsys cold start, not serverless cold start. Netflix's MediaFM is positioned as an enabler of "effective cold start of newly launching titles in recommendations" — the content-derived foundation model provides a usable item embedding at launch, before any user interaction signal is available. Canonical wiki instance of the recommendation- systems meaning of "cold start", distinct from the serverless / latency meaning above.

  • application-runtime cold-start datum for PHP-on-Lambda via Bref. Matthieu Napoli (2023-05-03) discloses: "The cold starts usually have a much slower execution time (one second instead of 75ms). However, we do not see them in the p50 or p95 metrics because they only impacted 1% of the requests in the first minute." Canonical wiki datum: ~1 s PHP+Laravel cold-start vs 75 ms warm p50, but at 50-concurrent fan-out only the first ~50 invocations (out of 3,800+ in the first minute) hit the cold path — <1% of total requests. Different regime from the AWS-disclosed VPC-mode network cold-start decomposition in sources/2026-04-22-allthingsdistributed-invisible-engineering-behind-lambdas-network: this is application-runtime cold start (PHP interpreter + Laravel service container + route tree boot), not network cold

  • sources/2026-05-08-databricks-how-superhuman-and-databricks-built-a-200k-qps-inference-platform-togethercontainer-image cold start at the GPU-pod tier — minutes → seconds via lazy-loading block-device image. The 2026-05-08 Databricks / Superhuman post canonicalises a fourth serverless cold-start regime distinct from CPU-runtime-init (Lambda), V8-isolate-warm-routing (Cloudflare Workers), and GPU-weights-loading (Fly.io): container image pull as the dominant cold-start stage when an autoscaler adds dozens of GPU pods during a traffic ramp. Quote: "This lazy-loading container filesystem eliminates the need to download the entire container image before starting the application, reducing time to start container from several minutes to just a few seconds." Mechanism (see patterns/block-device-container-image-for-lazy-loading): build-time conversion of the standard gzip image to a block-device-based format with 4MB sectors; pull-time retrieval of only metadata (directory structure + file names + permissions); virtual block device mounted into the container so the application starts immediately; per-block lazy-loaded on first read via an image-fetcher callback to the registry, with local block-cache to prevent repeated network round trips. Adopted from Databricks' prior "Booting Databricks VMs 7× faster" serverless-compute work and "fits well for the relatively small models we served for Superhuman". Distinct from other cold-start regimes because the bottleneck is container image bytes, not runtime init or GPU-RAM weight load — and the mitigation is pull-time-elimination via block-device lazy loading, not warm-pool maintenance or micro-VM snapshotting. Caveat from the post: the technique scales well "for the relatively small models we served" — multi- hundred-GB foundation models where weight loading itself dominates startup may have different characteristics. start. Compounds with concepts/shared-nothing-php-request-model: PHP's shares-nothing model means every invocation is effectively cold relative to application state, even if the Lambda execution context is warm — fixable via Laravel Octane.

  • sources/2026-05-04-instacart-empowering-carrot-ads-with-domain-adaptive-learningcanonical wiki instance of new-domain / new-partner cold-start in recsys, distinct from the new-item case (Netflix MediaFM) and the new-user case. Instacart's Carrot Ads applies Domain Adaptive Learning to bootstrap a wide- and-deep pCTR model for each newly-onboarded retailer partner. Mechanism: shared shopping-context-pre-trained embedding layers

  • per-partner taxonomy alignment + fine-tuning of partner- specific layers + per-partner feature trimming. Quote framing the problem: "onboarding a new partner onto Carrot Ads introduces a key challenge: the 'cold start' problem, where limited historical interactions make it difficult to predict user behavior accurately. ... Training a model from scratch for a new domain is data hungry. Conversely, directly deploying Instacart's existing Marketplace model often fails to capture the nuances of the partner's specific inventory and user base." Counter-intuitive disclosed property: DAL outperforms from-scratch training even when the target partner has enough data — because Instacart's first-party data contributes signal the target structurally lacks. Gating risk: negative transfer, currently guarded by HITL schema mapping + alignment verification; future Domain Adaptation Platform planned to automate domain-shift detection. The pattern that fits: see patterns/cross-domain-warm-start-via-shared-embeddings.

  • sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacartcanonical wiki instance of new-item cold-start solved via Semantic ID codebook coverage, a third axis distinct from MediaFM's content-derived embeddings (new-item, content-features) and Carrot Ads' Domain Adaptive Learning (new-domain, transfer-learning). Instacart's prior BERT-based CR scoring model hit a structural cold-start ceiling: "this occasionally caused it to memorize co-occurrences instead of learning generalized associations based on the user's intent. This resulted in the model favoring high-frequency items over newer products which are more aligned with the user's context." The successor, generative ads retrieval over Semantic IDs from an RQ-VAE codebook, dissolves the new-item cold-start problem at the vocabulary-substrate level: "SIDs provide coverage to every item in the catalog, regardless of whether it has a historical purchase history. A new product entering the catalog is added to one of the existing SIDs and is visible to the model from day one." Mechanism: the codebook is fixed, every new product is encoded into existing codewords, and the generative retriever can produce that SID's prefix path on day 1 without any transaction history. Operational evidence: brand-diversity wins disproportionately in dense categories where the prior CR model's scoring architecture had systematically suppressed emerging brands — +421% Alcohol, +396% Beverages, +229% Healthcare diversity lift in the retrieved candidate set. The pattern that fits: see patterns/rq-vae-codebook-as-product-vocabulary; the broader paradigm: see concepts/generative-retrieval. Three-axis canonicalisation across Instacart's wiki corpus: (1) new-domain cold-start = transfer learning + domain adaptation (Carrot Ads pCTR ranker, 2026-05-04); (2) new-product cold-start at retrieval = Semantic ID codebook coverage (2026-06-02, this source); (3) new-item cold-start at content side = content-derived embeddings (MediaFM, Netflix 2026-02-23). The three axes operate at different altitudes — domain (Carrot DAL), retrieval-vocabulary (Instacart SIDs), and item-content embeddings (Netflix MediaFM) — and compose rather than substitute.

  • sources/2026-06-02-instacart-semantic-ids-product-understanding-at-scaledeep-companion training-methodology disclosure for the cold-start-via-codebook-coverage axis above. The 2026-06-02 paired post (this entry's predecessor disclosed the consumer side) discloses the SID generation methodology — a contrastive loss term using the catalog taxonomy as graded supervision is the architectural reason SIDs cover cold-start products. Quote: "using our catalog taxonomy as the supervision signal rather than engagement data (which isn't available for cold-start products)." The catalog tree exists on day 1 for every product including cold-start ones; engagement data does not. The result is a codebook with coverage to every catalog item, regardless of purchase history — the cold-start-fixing property is structurally derived from the supervision-signal choice, not just an emergent property of the codebook compression. Disclosed cardinality: ~2,000 codeword tokens for Instacart's entire catalog. The post also introduces a separate-but-related cold-start-adjacent concern: tail-category coverage, where products in sparse categories lack the interaction data to surface — addressed by the same feature-derived (not engagement-derived) representation. The three-failure-mode framing (cold start + tail category coverage

  • catalog quality at scale) is canonical wiki framing of the recsys-failure-modes-the-SID-substrate-addresses.

  • sources/2026-06-10-databricks-ai-serving-platform-that-adapts-to-your-modelwarm-node-pool cold-start mitigation at the model-serving tier. Databricks Custom Model Serving discloses a three-layer cold-start strategy: (1) warm node pool — predictive algorithm maintains pre-provisioned nodes with base image pre-pulled; (2) parallel model download from a hot cache in cloud storage; (3) provisioned concurrency (min-replica floor). Combined, these compress cold start to "only the model download + init remain." Production observation: "You cannot optimize cold starts away. [...] Physics has a floor: bringing a pod up takes time that grows with model size, minutes for large GPU models." Fifth serverless/ infra cold-start regime on this page — distinct from Lambda VM/runtime-init, Workers isolate-warm-routing, Fly GPU weight-load, and Databricks block-device lazy-loading (Superhuman). The warm-pool approach complements (not replaces) lazy-loading — here the whole base image is pre-pulled, vs lazy-loading individual blocks on demand. See concepts/warm-node-pool + patterns/warm-node-pool-for-cold-start-reduction.

Last updated · 542 distilled / 1,571 read