CONCEPT Cited by 1 source
Warm isolate routing¶
Definition¶
Warm isolate routing is the scheduling policy used by V8- isolate-based serverless runtimes (canonically Cloudflare Workers, but the pattern is general) to prefer sending a request to an isolate that already has the target code loaded — skipping cold-start cost — over minting a fresh isolate.
It's the isolate-granularity sibling of VM-warm-pool routing in AWS Lambda / Firecracker (which has a slightly different shape because isolates are lighter and denser than micro-VMs).
The tuning axis: workload shape¶
A single warm-isolate-routing heuristic can't be right for every workload. Two opposite extremes:
I/O-bound workloads¶
Typical case: the isolate makes a fetch call, awaits for tens to hundreds of milliseconds. CPU on the isolate is near-idle during the wait. Coalescing more requests onto one warm isolate is free — none of them compete for CPU, and you avoid cold-start cost.
CPU-bound workloads¶
Typical case: the isolate spends all wall-clock time executing JS. Coalescing requests onto one warm isolate blocks them serially — request 2 waits until request 1 finishes CPU work. Time spent waiting isn't billed as CPU (Workers billing is CPU-time, not wall-clock), but client-observed latency inflates dramatically.
A heuristic tuned for the I/O-bound case silently underperforms on bursts of CPU-bound traffic.
The 2025-10 Cloudflare rebalance¶
Cloudflare originally optimized warm-isolate routing for latency and throughput across billions of requests — which for the median Workers customer means I/O-bound. The heuristic was designed to send more traffic to warm isolates to reduce cold starts for frameworks with heavy initialization (e.g., Next.js).
When Theo Browne's cf-vs-vercel-bench generated bursts of
expensive CPU-bound traffic from a single client, the
heuristic queued the later requests behind the long-running
earlier ones. The system's auto-scaling fallback — detect
queueing and spin up more isolates — did kick in but not fast
enough for a short, sharp CPU burst.
"As a result of this problem, the benchmark was not really measuring CPU time. Pricing on the Workers platform is based on CPU time — that is, time spent actually executing JavaScript code, as opposed to time waiting for things. Time spent waiting for the isolate to become available makes the request take longer, but is not billed as CPU time against the waiting request." (Source: sources/2025-10-14-cloudflare-unpacking-cloudflare-workers-cpu-performance-benchmarks)
Cloudflare's fix:
- Detect sustained CPU-heavy work earlier and distinguish it from I/O-bound workloads.
- Bias routing so new isolates spin up faster specifically for CPU-bound patterns.
- I/O-bound workloads still coalesce onto warm isolates (the original optimization target is preserved).
- Rolled out globally, automatic for all Workers.
Pricing vs latency decoupling (Workers-specific)¶
Because Workers bills CPU-time not wall-clock, isolate-queue wait doesn't show up on the customer's bill. A benchmark measuring wall-clock latency conflates CPU execution + isolate- queue wait into one number; a workload's actual billable CPU is unchanged by the routing heuristic. The benchmark thus looked like a CPU-speed gap and wasn't.
This is an important detail for serverless billing models where the accounting unit isn't wall-clock.
Related primitives¶
- concepts/cold-start — the latency class warm-isolate routing is designed to mitigate. Warm routing trades CPU bottleneck risk (the cost observed here) for cold-start latency.
- concepts/workload-aware-routing — the general L7 routing pattern of inspecting request shape to pick a backend. Warm isolate routing is a special case where the "backends" are warm-vs-cold isolate instances within one fleet and the inspected signal is historical per-isolate CPU load.
- concepts/hot-key — the sibling failure mode at the sharded-data-store layer. A hot isolate = hot key; both show the same "one heavy consumer serially blocks the rest" pathology.
- concepts/tail-latency-at-scale — queueing behind a slow predecessor is the classic tail-latency driver; warm isolate routing amplifies it under CPU-bound bursts.
Seen in¶
- sources/2025-10-14-cloudflare-unpacking-cloudflare-workers-cpu-performance-benchmarks — canonical wiki instance: Workers' warm-isolate heuristic tuned for I/O-bound workloads, visible as a CPU-looking benchmark gap when hit with CPU-bound bursts, re-tuned to detect CPU sustain and spin up isolates faster.
Related¶
- systems/cloudflare-workers — the runtime that instantiates this routing.
- concepts/cold-start — the sibling latency class.
- concepts/workload-aware-routing — the general shape-aware-routing category.
- concepts/hot-key — sharded-storage-layer analog.
- concepts/tail-latency-at-scale — the tail amplification mechanism when queueing kicks in.