Skip to content

SYSTEM Cited by 3 sources

Borg

Borg is Google's large-scale cluster-management system — the platform that packs jobs onto machines across Google's fleet, enforcing isolation, reclaiming resources, restarting failed tasks, and exposing a uniform scheduling surface to Gmail, YouTube, Maps, Search, and every other Google service. The foundational 2015 EuroSys paper ("Large-scale cluster management at Google with Borg") is the canonical public reference; Kubernetes is the open-source lineage descendant.

What shows up on this wiki

Borg appears in the wiki so far as the production target of Google Research's Regression Language Model work (2025-07-29), where text-to-text regression is used to predict Borg's own scheduler-efficiency metric without running the expensive combinatorial solver.

MIPS per GCU — the efficiency metric

The 2025-07-29 post frames MIPS per GCU (Millions of Instructions Per Second per Google Compute Unit) as the "key efficiency metric" Borg uses to judge whether a proposed allocation is a good one. GCU is Google's internal fleet-normalised unit of compute; MIPS-per-GCU is effectively useful-work-produced per unit-of-compute-spent.

Accurate MIPS-per-GCU forecasting matters because:

The Borg digital twin

Google operates a digital twin of Borg — a backtesting framework that replicates the state of real-world clusters for counterfactual evaluation. The 2025-07-29 post names this digital twin as:

  • The training-data source for the RLM: synthesised (x = cluster-state-as-string, y = MIPS-per-GCU-from-bin-packer) pairs come out of running the bin-packing algorithm inside the twin.
  • The ground-truth fallback implied by the fast-path/slow-path deployment: when the RLM reports high uncertainty, the slow bin-packing simulation is the authoritative answer.

The digital twin itself is not described architecturally in the 2025-07-29 post — only named and used.

Scheduler = bin-packing

Borg's core scheduling decision is bin packing: given a job with resource requests (CPU, RAM, disk, GPU/TPU, network, etc.) and a fleet of machines with remaining capacity, pick a machine (or a set of machines) to run the job on. The 2025-07-29 post names the specific target of the RLM as "the numeric result of a specialized bin-packing algorithm used to efficiently allocate tasks to resources" — i.e. the scheduler's objective function, not raw CPU counters or memory utilisation.

This is why Google frames the RLM as a simulator of Borg rather than a monitor: it predicts what Borg's scheduler would have decided, not what the hardware is currently doing.

What the wiki doesn't yet have

  • Borg's architecture itself (BorgMaster, Borglets, scheduler, Paxos-replicated state) — not introduced in the 2025-07-29 post, pending an ingested source that covers the 2015 paper.
  • Google Compute Unit (GCU) definition — referenced by performance prediction sources but not separately documented.
  • The digital twin's implementation — the 2025-07-29 post only uses it as a black box.

VM allocation as lifetime-aware bin-packing (2025-10-17)

The 2025-10-17 Google Research LAVA post re-opens Borg-adjacent scheduling as a second ML-for-systems angle on the same substrate — at a different layer from the 2025-07-29 RLM work. Where the RLM predicts the bin-packer's output (MIPS per GCU) so the scheduler can short-circuit the slow solver, the LAVA family augments the bin-packer's policy with learned VM lifetime predictions so placement itself becomes lifetime-aware (Source: sources/2025-10-17-google-solving-virtual-machine-puzzles-lava).

  • Problem framing. VM allocation is online bin-packing with pieces that "appear and disappear" at unknown times. Naive packing produces two named failure modes — resource stranding and empty-host loss — that the LAVA family explicitly targets.
  • Load-bearing primitive: continuous reprediction of the remaining-lifetime distribution. Replaces the naive single-prediction-at-creation approach, whose structural hazard is that "a single misprediction can tie up an entire host for an extended period, degrading efficiency".
  • Three insertion points: NILAS scoring, LAVA allocation, LARS rescheduling — see systems/lava-vm-scheduler for the full trio.
  • Production-deployment status on Borg: not disclosed in the raw capture; the arXiv paper is the authoritative source.

Together the 2025-07-29 + 2025-10-17 pair pins two Google Research ML-for-systems proof points on Borg-adjacent infrastructure at different layers — output-prediction (RLM) and policy-intervention (LAVA / NILAS / LARS).

Online throughput scheduling theory (2026-02-11)

Google Research's 2026-02-11 "Scheduling in a changing world: Maximizing throughput with time-varying capacity" post introduces a third Borg-adjacent scheduling proof point — this time from the algorithmic-theory side rather than the ML-for-systems side. The production motivating example is named directly as "all data processing must finish by the nightly batch run", the exact shape of a Borg batch-job schedule (Source: sources/2026-02-11-google-scheduling-in-a-changing-world-time-varying-capacity).

  • Problem class. Online throughput-maximising scheduling under a time-varying capacity profile — the number of jobs the scheduler can run concurrently varies over wall-clock time (diurnal load, spot preemption, hardware failures).
  • Competitive-ratio landscape across preemption regimes. Non-preemptive online scheduling has competitive ratio approaching zero (one long-job commitment can starve arbitrarily many shorts). Interrupt-and- restart preemption recovers the offline ½-competitive bound via the earliest-finish-job greedy. Interrupt-without-restart is adversarially unwinnable in general but becomes constant- competitive under common deadlines.
  • Algorithmic primitive: tentative schedule revised by a fixed four-action rule on each job arrival (unit-capacity common-deadline variant). The full four-action specification is in the paper but not in the raw capture.
  • Production-deployment status on Borg: not disclosed in the raw capture; the paper is research-side algorithmic theory with production-shape motivation.

With the 2026-02-11 entry the wiki now has three Google Research proof points on Borg-adjacent scheduling at three different intervention layers:

Insertion point Approach Canonical post
Bin-packer output prediction ML approximator for MIPS-per-GCU 2025-07-29 systems/regression-language-model (RLM)
VM-allocation policy Learned-lifetime distribution + continuous reprediction 2025-10-17 systems/lava-vm-scheduler (LAVA / NILAS / LARS)
Online-throughput scheduling theory Competitive-ratio analysis + tentative-schedule revision 2026-02-11 sources/2026-02-11-google-scheduling-in-a-changing-world-time-varying-capacity

Each post targets a different layer of Borg's scheduling stack with a different primary primitive — the recurring shape is "Borg scheduling is rich enough to support orthogonal interventions at the prediction, policy, and theory layers simultaneously".

Seen in

Last updated · 200 distilled / 1,178 read