PATTERN Cited by 1 source
Lifetime-aware rescheduling¶
After initial placement, continue tracking the workload's lifetime distribution and move the workload when the current placement becomes inefficient relative to the updated picture. Named LARS (Lifetime-Aware Rescheduling) in Google Research's 2025-10-17 LAVA post — the rescheduling-layer sibling of the NILAS scoring / LAVA allocation layers in the same algorithmic family (Source: sources/2025-10-17-google-solving-virtual-machine-puzzles-lava).
Intent¶
Traditional schedulers place a workload once and hold the placement until the workload exits. That's fine when the scheduler had full information at placement time. When the scheduler's view is imperfect — e.g. lifetimes were unknown, as in VM allocation — the initial placement can become suboptimal as the world reveals itself. Lifetime-aware rescheduling closes the loop: observed-trajectory evidence updates the lifetime picture, and when the update is large enough, the scheduler migrates workloads to reclaim the efficiency the stale placement has lost.
Mechanism¶
- Continuous prediction. A continuously repredicted lifetime distribution per workload, refreshing as the workload continues to run.
- Rebalance-worth test. Compare the current placement efficiency (under the updated prediction) against the best alternative placement. The test includes the migration cost — it's only worth moving a workload if the rebalance's expected benefit, over the workload's expected remaining lifetime, exceeds the migration cost plus the risk the updated prediction is wrong.
- Migrate / restart / evict. Execute the move using whichever mechanism the substrate supports: live migration (preferred, low disruption), suspend-and-resume, or restart-in-place.
- Update cluster state; re-enter the loop. The new placement is just another placement — it's subject to the same continuous-prediction + rebalance-worth-test loop if the picture changes again.
Why it's structurally different from first-fit rescheduling¶
Classic schedulers include rescheduling (bin-packing compaction, defragmentation) but trigger it from state ("this host is too empty, coalesce"), not from prediction update ("our estimate of this VM's remaining lifetime just changed"). Lifetime-aware rescheduling is prediction-triggered — the signal is a shift in the predicted distribution, not a directly-observed utilisation metric.
This matters because utilisation-triggered rebalancing is reactive (acts on symptoms) while prediction-triggered rebalancing is proactive (acts on expected future state). In a system with expensive migration, proactive moves that avoid future inefficiency are more valuable than reactive moves that patch current inefficiency.
The migration-cost threshold¶
The load-bearing operational parameter is how confident the prediction update must be, and how large the efficiency delta must be, before a migration fires. Set the threshold too low → migration churn, oscillation, workload disruption. Set it too high → the rescheduler rarely fires, and the framework degenerates to single-shot allocation + occasional defragmentation.
Tuning this threshold is the practical challenge; calibrated learned lifetime distributions make it tractable — you can set the trigger on "P(current placement is optimal) < X%" rather than on a heuristic.
Contrast with¶
cheap-approximator-with-expensive-fallback
Both patterns use prediction uncertainty as a control signal. Different concretisations:
| Aspect | Cheap-approximator fallback | Lifetime-aware rescheduling |
|---|---|---|
| What uncertainty triggers | Run the slow solver | Migrate the workload |
| Trigger frequency | Per query | Per workload, continuously |
| Reversal cost | Cheap (next query is fresh) | Expensive (migrate) |
| Cost symmetry | Symmetric (slow path always possible) | Asymmetric (migration isn't free) |
Both share the discipline that calibrated uncertainty / distribution width is the load-bearing signal. They're sibling patterns at different insertion points in the ML-for-systems stack.
When it's the right shape¶
- Placement decision is expensive to reverse, but migration is cheap relative to the efficiency loss of stale placements.
- Prediction evolves materially over the workload's lifetime (continuous reprediction provides real new information).
- The workload is long-lived enough that rescheduling cost amortises (doesn't pay off for minute-scale workloads).
- The substrate supports low-disruption migration (live migration, pause-resume, fast checkpoint-restart).
When it's the wrong shape¶
- Migration is expensive or disruptive (e.g. stateful database workload, workload with hard affinity to a specific node).
- Workloads are short enough that creation-time prediction is good enough.
- Prediction updates are rare or small (reprediction adds no new information).
- Rescheduling cost exceeds expected efficiency gain.
Seen in¶
- sources/2025-10-17-google-solving-virtual-machine-puzzles-lava — canonical wiki instance; LARS (Lifetime-Aware Rescheduling) is the rescheduling-layer component of Google's lifetime-aware VM allocation family alongside NILAS scoring and LAVA allocation.
Related¶
- concepts/continuous-reprediction
- concepts/vm-lifetime-prediction
- concepts/learned-lifetime-distribution
- concepts/bin-packing
- concepts/resource-stranding
- concepts/empty-host
- systems/lava-vm-scheduler
- systems/borg
- patterns/cheap-approximator-with-expensive-fallback — sibling pattern; uses uncertainty as control signal at per-query granularity vs per-workload-continuously here.
- patterns/learned-distribution-over-point-prediction