GOOGLE 2025-10-17 Tier 1

Google Research — Solving virtual machine puzzles: How AI is optimizing cloud computing¶

Summary¶

Google Research post (2025-10-17) introducing a trio of lifetime-aware VM-allocation algorithms — NILAS (Non-Invasive Lifetime Aware Scoring), LAVA (Lifetime-Aware VM Allocation), and LARS (Lifetime-Aware Rescheduling) — for the online bin-packing problem at the heart of cloud scheduling. Core framing: VM allocation is Tetris-with- disappearing-pieces, and the disappearing times are unknown at placement. Pure ML VM-lifetime prediction at creation time has a structural failure mode — a single misprediction can tie up an entire host for an extended period, degrading efficiency — so the LAVA family replaces the one-shot guess with continuous reprediction: the expected remaining lifetime is updated constantly as a VM continues to run. Problem framing also promotes two named operational metrics the scheduler optimises against: resource stranding (remaining per-host resources too small or unbalanced to host new VMs) and empty hosts (needed for system updates and for provisioning large resource-intensive VMs). Canonical wiki instance of online bin-packing with lifetime uncertainty, and the second Google Research ML-for- systems proof point on Borg-adjacent workloads after the 2025-07-29 Regression Language Model post — but at a different insertion point (placement + re-placement policy, not solver-output prediction).

⚠️ Raw-file scope note. The local raw scrape (raw/google/2025-10-17-solving-virtual-machine-puzzles-how-ai-is-optimizing-cloud-c-1ff642a3.md, 17 lines) captures only the intro framing through the algorithm-naming paragraph — the architectural detail of NILAS scoring, LAVA's distribution handling, LARS's rescheduling trigger logic, measured production savings, and Borg integration specifics are in the linked arXiv paper but not in the raw capture. Wiki pages are scoped to what the raw verifiably contains; deeper detail marked as gaps.

Key takeaways¶

Cloud VM allocation is online bin-packing with unknown disappearance times. The opening analogy — Tetris with rapidly-falling pieces, some of which fit perfectly and others don't, "pieces appear and disappear, some with a lifespan of only minutes, and others, days" — frames the problem class the paper addresses. If lifetimes were known, allocation would be "clearly much better"; they aren't, and the scheduler must commit to a placement anyway (Source: sources/2025-10-17-google-solving-virtual-machine-puzzles-lava).
Two named failure modes dominate the economics: resource stranding and loss of empty hosts. Poor VM allocation leads to "resource stranding, where a server's remaining resources are too small or unbalanced to host new VMs, effectively wasting capacity" and "reduces the number of empty hosts, which are essential for tasks like system updates and provisioning large, resource-intensive VMs". These are the two operational metrics the LAVA family is designed to improve simultaneously — not just "pack tighter" (Source: sources/2025-10-17-google-solving-virtual-machine-puzzles-lava).
At data-center scale, efficiency improvements are simultaneously an economics and a sustainability lever. "At the scale of large data centers, efficient resource use is especially critical for both economic and environmental reasons." The post frames the work as dual-motivated — the argument that underwrites ML-for-systems investment in Google's cluster layer (Source: sources/2025-10-17-google-solving-virtual-machine-puzzles-lava).
Single-shot lifetime prediction has a structural hazard: one bad guess can poison a host for a long time. "AI can help with this problem by using learned models to predict VM lifetimes. However, this often relies on a single prediction at the VM's creation. The challenge with this approach is that a single misprediction can tie up an entire host for an extended period, degrading efficiency." This is the wiki's canonical articulation of why point-prediction alone is insufficient for online bin-packing — the tail risk per misprediction is bounded below only by the VM's actual lifetime, which can be days (Source: sources/2025-10-17-google-solving-virtual-machine-puzzles-lava).
Continuous reprediction is the LAVA family's answer. Not a single lifetime guess at VM creation but "the model constantly and automatically updates its prediction for a VM's expected remaining lifetime as the VM continues to run." This changes the information structure of the scheduler's decision: misprediction at t=0 is recoverable at t=k because the estimate sharpens as the VM's trajectory is observed. Adjacent to (but distinct from) control-theoretic online re-estimation; the paper pitches it as the load-bearing primitive (Source: sources/2025-10-17-google-solving-virtual-machine-puzzles-lava).
The algorithms are a trio with distinct insertion points. NILAS (Non-Invasive Lifetime Aware Scoring) — scoring layer for placement; LAVA (Lifetime-Aware VM Allocation) — allocation algorithm with learned distributions and misprediction adaptation; LARS (Lifetime-Aware Rescheduling) — post-placement rescheduling when the picture changes. The names encode the insertion points: scoring (read-only, easy to deploy), allocation (changes the placement decision), rescheduling (changes already-placed VMs). The raw capture doesn't describe the internals of each; the arXiv paper is the canonical reference (Source: sources/2025-10-17-google-solving-virtual-machine-puzzles-lava).
The post pairs with 2025-07-29 RLM/Borg as a second ML-for-systems proof point — at a different layer. The 2025-07-29 work predicts the bin-packer's output (MIPS per GCU) so the scheduler's inner loop can short-circuit bad candidates. The 2025-10-17 work changes the bin-packer's policy — the scheduler itself becomes lifetime-aware via continuous reprediction. Both sit on Borg- adjacent infrastructure; both use ML to reshape scheduling economics; they're orthogonal along the insertion-point axis.

Systems introduced¶

systems/lava-vm-scheduler — the algorithmic-family wiki page covering NILAS + LAVA + LARS. Stub-grade: the raw capture names the three algorithms and the continuous- reprediction primitive; internals defer to the arXiv paper.

Systems extended¶

systems/borg — the post's production target for VM allocation. New recurring-angle: VM-scheduling-policy intervention via continuous reprediction, complementary to the 2025-07-29 RLM work at the bin-packer-output-prediction layer.

Concepts introduced¶

concepts/resource-stranding — named failure mode where a server's remaining resources are too small or unbalanced to host any candidate VM, effectively wasting capacity. Load- bearing wiki concept for cluster-level efficiency discussion.
concepts/empty-host — named operational primitive (fully-empty physical server) the scheduler must preserve for (a) system updates / planned maintenance, (b) provisioning large resource-intensive VMs that don't fit on partially- occupied hosts.
concepts/vm-lifetime-prediction — the problem class of predicting how long a VM will run, as an input to placement scoring. Subclass of performance prediction but with a distinct hazard profile — single bad predictions have disproportionate cost because the decision they drive (placement) is expensive to reverse.
concepts/continuous-reprediction — online policy of re-running a predictor as new observations arrive, replacing a single commit-at-creation prediction. Enables recovery from early-stage misprediction; requires that the downstream consumer (scheduler, re-scheduler) can act on updated estimates, not just the initial one.
concepts/learned-lifetime-distribution — the unit the LAVA family predicts: a distribution over remaining lifetime rather than a point estimate. Enables risk-aware placement (e.g. avoid placing a likely-short-lived VM next to a likely-long-lived one if the scheduler wants to preserve the empty-host property).

Concepts extended¶

concepts/bin-packing — adds VM-allocation-as-online- bin-packing framing; Google's LAVA family is canonical wiki instance of lifetime-aware online bin-packing with learned distributions. Contrasts with the static / batch bin-packing variants already on the wiki.
concepts/performance-prediction — the 2025-10-17 post extends the lineage: performance prediction was framed around MIPS per GCU in 2025-07-29; here it's lifetime prediction per VM. Both are ML-for-systems cheap approximators for scheduler-internal decisions.

Patterns introduced¶

patterns/lifetime-aware-rescheduling — LARS instantiates this pattern: after initial placement, continue tracking the lifetime distribution and move VMs when the current placement becomes inefficient relative to the updated picture. Trades placement-stability for packing-efficiency; requires live-migration primitive (or equivalent) to be cheap enough to pay off.
patterns/learned-distribution-over-point-prediction — when downstream decisions are cost-asymmetric in the prediction error (a too-long lifetime estimate that holds a host open is not the same cost as a too-short one that forces eviction), emit a distribution from the predictor, not a point. Lets the consumer reason about tail risk rather than mean error.

Patterns extended¶

patterns/cheap-approximator-with-expensive-fallback — sibling framing: LAVA's continuous reprediction is not a cheap/expensive split, it's a coarse-then-refined split on the same predictor. But the family shares the calibrated- uncertainty-as-load-bearing discipline: rescheduling only fires when the update is confident enough to justify the migration cost. The wiki notes this as a distinction, not a duplication.

Operational numbers¶

None disclosed in the captured post intro. The raw scrape ends at the algorithm-naming paragraph; percent-improvement figures (stranded-capacity reduction, empty-host preservation, measured-vs-baseline efficiency) are in the linked arXiv paper (2412.09840v1) but not in the raw this page is built from. Flagged as a gap.

Caveats¶

Intro-only raw capture. Internals of NILAS scoring, LAVA's distribution handling, LARS's rescheduling trigger logic, measured production savings, and Borg integration specifics are not in the raw.
Production deployment status not disclosed in intro. The post frames the algorithms as Google's work on the problem; whether they're in production on Google's public cloud, on internal Borg workloads, or paper-only is not in the raw scope.
No baseline comparison numbers in raw. The post presumably compares against non-lifetime-aware placement, and against single-shot lifetime-prediction baselines — but those numbers live in the arXiv paper, not the raw intro.
Relationship to the 2025-07-29 RLM work not explicit. Both target cluster efficiency on Borg-adjacent infrastructure but operate at different layers; the post doesn't cross-reference. Wiki pages note the relationship as editorial framing.

Source¶

systems/borg
systems/lava-vm-scheduler
concepts/bin-packing
concepts/vm-lifetime-prediction
concepts/learned-lifetime-distribution
concepts/continuous-reprediction
concepts/resource-stranding
concepts/empty-host
concepts/performance-prediction
patterns/lifetime-aware-rescheduling
patterns/learned-distribution-over-point-prediction
patterns/cheap-approximator-with-expensive-fallback
sources/2025-07-29-google-simulating-large-systems-with-regression-language-models — sibling ML-for-systems proof point on Borg at a different insertion point (bin-packer output prediction vs. VM-lifetime-driven placement policy)
companies/google