SYSTEM Cited by 3 sources
Borg¶
Borg is Google's large-scale cluster-management system — the platform that packs jobs onto machines across Google's fleet, enforcing isolation, reclaiming resources, restarting failed tasks, and exposing a uniform scheduling surface to Gmail, YouTube, Maps, Search, and every other Google service. The foundational 2015 EuroSys paper ("Large-scale cluster management at Google with Borg") is the canonical public reference; Kubernetes is the open-source lineage descendant.
What shows up on this wiki¶
Borg appears in the wiki so far as the production target of Google Research's Regression Language Model work (2025-07-29), where text-to-text regression is used to predict Borg's own scheduler-efficiency metric without running the expensive combinatorial solver.
- Workload breadth. Google explicitly lists GMail, YouTube, and Maps among the production workloads Borg schedules in the 2025-07-29 post — the point being that the RLM has to generalise across the full diversity of Google's service portfolio, not just one workload class (Source: sources/2025-07-29-google-simulating-large-systems-with-regression-language-models).
- Heterogeneous hardware. Borg schedules across CPUs and TPUs in the same fleet; the RLM input includes hardware descriptors so the same model can predict MIPS-per-GCU on any machine type (Source: sources/2025-07-29-google-simulating-large-systems-with-regression-language-models).
MIPS per GCU — the efficiency metric¶
The 2025-07-29 post frames MIPS per GCU (Millions of Instructions Per Second per Google Compute Unit) as the "key efficiency metric" Borg uses to judge whether a proposed allocation is a good one. GCU is Google's internal fleet-normalised unit of compute; MIPS-per-GCU is effectively useful-work-produced per unit-of-compute-spent.
Accurate MIPS-per-GCU forecasting matters because:
- Scheduling across thousands of machines involves choosing among many candidate placements per job; an efficient-placement predictor lets the scheduler short-circuit bad candidates.
- At Google fleet scale, even single-digit efficiency percentage points translate into billions of dollars of hardware (Source: sources/2025-07-29-google-simulating-large-systems-with-regression-language-models).
The Borg digital twin¶
Google operates a digital twin of Borg — a backtesting framework that replicates the state of real-world clusters for counterfactual evaluation. The 2025-07-29 post names this digital twin as:
- The training-data source for the RLM: synthesised
(x = cluster-state-as-string, y = MIPS-per-GCU-from-bin-packer)pairs come out of running the bin-packing algorithm inside the twin. - The ground-truth fallback implied by the fast-path/slow-path deployment: when the RLM reports high uncertainty, the slow bin-packing simulation is the authoritative answer.
The digital twin itself is not described architecturally in the 2025-07-29 post — only named and used.
Scheduler = bin-packing¶
Borg's core scheduling decision is bin packing: given a job with resource requests (CPU, RAM, disk, GPU/TPU, network, etc.) and a fleet of machines with remaining capacity, pick a machine (or a set of machines) to run the job on. The 2025-07-29 post names the specific target of the RLM as "the numeric result of a specialized bin-packing algorithm used to efficiently allocate tasks to resources" — i.e. the scheduler's objective function, not raw CPU counters or memory utilisation.
This is why Google frames the RLM as a simulator of Borg rather than a monitor: it predicts what Borg's scheduler would have decided, not what the hardware is currently doing.
What the wiki doesn't yet have¶
- Borg's architecture itself (BorgMaster, Borglets, scheduler, Paxos-replicated state) — not introduced in the 2025-07-29 post, pending an ingested source that covers the 2015 paper.
- Google Compute Unit (GCU) definition — referenced by performance prediction sources but not separately documented.
- The digital twin's implementation — the 2025-07-29 post only uses it as a black box.
VM allocation as lifetime-aware bin-packing (2025-10-17)¶
The 2025-10-17 Google Research LAVA post re-opens Borg-adjacent scheduling as a second ML-for-systems angle on the same substrate — at a different layer from the 2025-07-29 RLM work. Where the RLM predicts the bin-packer's output (MIPS per GCU) so the scheduler can short-circuit the slow solver, the LAVA family augments the bin-packer's policy with learned VM lifetime predictions so placement itself becomes lifetime-aware (Source: sources/2025-10-17-google-solving-virtual-machine-puzzles-lava).
- Problem framing. VM allocation is online bin-packing with pieces that "appear and disappear" at unknown times. Naive packing produces two named failure modes — resource stranding and empty-host loss — that the LAVA family explicitly targets.
- Load-bearing primitive: continuous reprediction of the remaining-lifetime distribution. Replaces the naive single-prediction-at-creation approach, whose structural hazard is that "a single misprediction can tie up an entire host for an extended period, degrading efficiency".
- Three insertion points: NILAS scoring, LAVA allocation, LARS rescheduling — see systems/lava-vm-scheduler for the full trio.
- Production-deployment status on Borg: not disclosed in the raw capture; the arXiv paper is the authoritative source.
Together the 2025-07-29 + 2025-10-17 pair pins two Google Research ML-for-systems proof points on Borg-adjacent infrastructure at different layers — output-prediction (RLM) and policy-intervention (LAVA / NILAS / LARS).
Online throughput scheduling theory (2026-02-11)¶
Google Research's 2026-02-11 "Scheduling in a changing world: Maximizing throughput with time-varying capacity" post introduces a third Borg-adjacent scheduling proof point — this time from the algorithmic-theory side rather than the ML-for-systems side. The production motivating example is named directly as "all data processing must finish by the nightly batch run", the exact shape of a Borg batch-job schedule (Source: sources/2026-02-11-google-scheduling-in-a-changing-world-time-varying-capacity).
- Problem class. Online throughput-maximising scheduling under a time-varying capacity profile — the number of jobs the scheduler can run concurrently varies over wall-clock time (diurnal load, spot preemption, hardware failures).
- Competitive-ratio landscape across preemption regimes. Non-preemptive online scheduling has competitive ratio approaching zero (one long-job commitment can starve arbitrarily many shorts). Interrupt-and- restart preemption recovers the offline ½-competitive bound via the earliest-finish-job greedy. Interrupt-without-restart is adversarially unwinnable in general but becomes constant- competitive under common deadlines.
- Algorithmic primitive: tentative schedule revised by a fixed four-action rule on each job arrival (unit-capacity common-deadline variant). The full four-action specification is in the paper but not in the raw capture.
- Production-deployment status on Borg: not disclosed in the raw capture; the paper is research-side algorithmic theory with production-shape motivation.
With the 2026-02-11 entry the wiki now has three Google Research proof points on Borg-adjacent scheduling at three different intervention layers:
| Insertion point | Approach | Canonical post |
|---|---|---|
| Bin-packer output prediction | ML approximator for MIPS-per-GCU | 2025-07-29 systems/regression-language-model (RLM) |
| VM-allocation policy | Learned-lifetime distribution + continuous reprediction | 2025-10-17 systems/lava-vm-scheduler (LAVA / NILAS / LARS) |
| Online-throughput scheduling theory | Competitive-ratio analysis + tentative-schedule revision | 2026-02-11 sources/2026-02-11-google-scheduling-in-a-changing-world-time-varying-capacity |
Each post targets a different layer of Borg's scheduling stack with a different primary primitive — the recurring shape is "Borg scheduling is rich enough to support orthogonal interventions at the prediction, policy, and theory layers simultaneously".
Seen in¶
- sources/2025-07-29-google-simulating-large-systems-with-regression-language-models — Borg as the production target of Google's RLM work; MIPS-per- GCU as the efficiency metric; digital-twin backtesting as the training-data source.
- sources/2025-10-17-google-solving-virtual-machine-puzzles-lava — Borg-adjacent VM allocation as online bin-packing with learned lifetime distributions; LAVA / NILAS / LARS trio.
- sources/2026-02-11-google-scheduling-in-a-changing-world-time-varying-capacity — online throughput-maximising scheduling theory for time-varying capacity profiles; competitive-ratio analysis across three preemption regimes; common-deadline variant motivated by "all data processing must finish by the nightly batch run".
Related¶
- companies/google
- concepts/bin-packing
- concepts/performance-prediction
- concepts/digital-twin-backtesting
- concepts/uncertainty-quantification
- concepts/vm-lifetime-prediction
- concepts/continuous-reprediction
- concepts/learned-lifetime-distribution
- concepts/resource-stranding
- concepts/empty-host
- systems/regression-language-model
- systems/regress-lm
- systems/lava-vm-scheduler
- patterns/cheap-approximator-with-expensive-fallback
- patterns/lifetime-aware-rescheduling
- patterns/learned-distribution-over-point-prediction
- patterns/token-limit-aware-feature-prioritization
- concepts/online-scheduling
- concepts/competitive-ratio
- concepts/common-deadline-scheduling
- concepts/tentative-schedule
- patterns/interrupt-and-restart
- patterns/earliest-finish-job-greedy