Skip to content

PATTERN Cited by 1 source

Lifetime-aware rescheduling

After initial placement, continue tracking the workload's lifetime distribution and move the workload when the current placement becomes inefficient relative to the updated picture. Named LARS (Lifetime-Aware Rescheduling) in Google Research's 2025-10-17 LAVA post — the rescheduling-layer sibling of the NILAS scoring / LAVA allocation layers in the same algorithmic family (Source: sources/2025-10-17-google-solving-virtual-machine-puzzles-lava).

Intent

Traditional schedulers place a workload once and hold the placement until the workload exits. That's fine when the scheduler had full information at placement time. When the scheduler's view is imperfect — e.g. lifetimes were unknown, as in VM allocation — the initial placement can become suboptimal as the world reveals itself. Lifetime-aware rescheduling closes the loop: observed-trajectory evidence updates the lifetime picture, and when the update is large enough, the scheduler migrates workloads to reclaim the efficiency the stale placement has lost.

Mechanism

  1. Continuous prediction. A continuously repredicted lifetime distribution per workload, refreshing as the workload continues to run.
  2. Rebalance-worth test. Compare the current placement efficiency (under the updated prediction) against the best alternative placement. The test includes the migration cost — it's only worth moving a workload if the rebalance's expected benefit, over the workload's expected remaining lifetime, exceeds the migration cost plus the risk the updated prediction is wrong.
  3. Migrate / restart / evict. Execute the move using whichever mechanism the substrate supports: live migration (preferred, low disruption), suspend-and-resume, or restart-in-place.
  4. Update cluster state; re-enter the loop. The new placement is just another placement — it's subject to the same continuous-prediction + rebalance-worth-test loop if the picture changes again.

Why it's structurally different from first-fit rescheduling

Classic schedulers include rescheduling (bin-packing compaction, defragmentation) but trigger it from state ("this host is too empty, coalesce"), not from prediction update ("our estimate of this VM's remaining lifetime just changed"). Lifetime-aware rescheduling is prediction-triggered — the signal is a shift in the predicted distribution, not a directly-observed utilisation metric.

This matters because utilisation-triggered rebalancing is reactive (acts on symptoms) while prediction-triggered rebalancing is proactive (acts on expected future state). In a system with expensive migration, proactive moves that avoid future inefficiency are more valuable than reactive moves that patch current inefficiency.

The migration-cost threshold

The load-bearing operational parameter is how confident the prediction update must be, and how large the efficiency delta must be, before a migration fires. Set the threshold too low → migration churn, oscillation, workload disruption. Set it too high → the rescheduler rarely fires, and the framework degenerates to single-shot allocation + occasional defragmentation.

Tuning this threshold is the practical challenge; calibrated learned lifetime distributions make it tractable — you can set the trigger on "P(current placement is optimal) < X%" rather than on a heuristic.

Contrast with

cheap-approximator-with-expensive-fallback

Both patterns use prediction uncertainty as a control signal. Different concretisations:

Aspect Cheap-approximator fallback Lifetime-aware rescheduling
What uncertainty triggers Run the slow solver Migrate the workload
Trigger frequency Per query Per workload, continuously
Reversal cost Cheap (next query is fresh) Expensive (migrate)
Cost symmetry Symmetric (slow path always possible) Asymmetric (migration isn't free)

Both share the discipline that calibrated uncertainty / distribution width is the load-bearing signal. They're sibling patterns at different insertion points in the ML-for-systems stack.

When it's the right shape

  • Placement decision is expensive to reverse, but migration is cheap relative to the efficiency loss of stale placements.
  • Prediction evolves materially over the workload's lifetime (continuous reprediction provides real new information).
  • The workload is long-lived enough that rescheduling cost amortises (doesn't pay off for minute-scale workloads).
  • The substrate supports low-disruption migration (live migration, pause-resume, fast checkpoint-restart).

When it's the wrong shape

  • Migration is expensive or disruptive (e.g. stateful database workload, workload with hard affinity to a specific node).
  • Workloads are short enough that creation-time prediction is good enough.
  • Prediction updates are rare or small (reprediction adds no new information).
  • Rescheduling cost exceeds expected efficiency gain.

Seen in

Last updated · 200 distilled / 1,178 read