Google Research¶
Google Research (research.google) is a Tier-1 source on the sysdesign-wiki. Google's research org publishes both foundational systems work (distributed databases, query engines, large-scale ML infrastructure) and ML / applied-AI research with concrete serving-infra implications (foundation models, multimodal encoders, agent systems). The wiki filters for posts with distributed-systems internals, scaling trade-offs, query- engine / language-design, or production serving-infrastructure content; pure model-training / benchmark-only research without serving-infra is skipped per AGENTS.md.
Recurring shapes on the blog, based on the posts currently ingested:
- Language / dialect design for planet-scale data systems. The
2024-08-24 pipe-syntax-in-SQL paper is the canonical wiki instance
— extending GoogleSQL (shared across BigQuery, F1, Spanner) with
|>data-flow syntax as an additive extension rather than proposing yet another SQL replacement. - Foundation models with the serving-infra story made explicit. The 2024-05-09 VideoPrism post is the canonical wiki instance — two-stage contrastive-then-masked pretraining, frozen-encoder multi-task adaptation, explicit training/serving boundary.
- Program-correctness retrospectives on real production bugs.
The 2025-01-11 republication of Joshua Bloch's 2006 binary-search
/ mergesort integer-overflow post is the canonical wiki instance —
a measured bug in JDK
Arrays.binarySearch(authored by Bloch) and in Bentley's Programming Pearls used as a case study for "proof + test + review is insufficient; consider invariants against substrate semantics." - ML-for-systems with production proof points. The 2025-07-29 Regression Language Model post is the canonical wiki instance — 60M-parameter encoder-decoder reading Borg cluster state as YAML/JSON and emitting MIPS-per-GCU as a decoded numeric string, with multi-sample decoding recovering uncertainty so the fast ML path can fall back to the slow bin- packing simulator on high-uncertainty inputs. The 2025-10-17 LAVA / NILAS / LARS post is the wiki's second Google Research proof point on Borg-adjacent scheduling at a different layer — where the RLM predicts the bin-packer's output so the inner loop can skip the slow solver, the LAVA family augments the bin-packer's policy with learned VM-lifetime distributions and continuous reprediction to target the named operational failure modes resource stranding and empty-host loss that differentiate online-bin-packing-with-unknown-disappearance- times from static bin-packing. Together the two posts pin a recurring shape: ML-for-systems interventions on Borg-adjacent infrastructure at multiple insertion points in the scheduler stack. Extended by the 2026-02-11 time-varying-capacity scheduling post — a third Borg- adjacent proof point from the algorithmic-theory layer rather than the ML-for-systems layer: [[concepts/competitive- ratio|competitive-ratio]] analysis of online throughput- maximising scheduling across three preemption regimes, with common-deadline restriction restoring constant-competitive bounds via a tentative-schedule revision algorithm, motivated by "all data processing must finish by the nightly batch run".
- On-device ML serving at planet scale, bridged by distillation. The 2025-08-21 YouTube real-time generative AI effects post is the canonical wiki instance — large offline generative-model teacher (StyleGAN2 then Imagen) distilled into a small on-device student (UNet with a MobileNet encoder + MobileNet-block decoder) so real-time generative AI effects can run at camera-frame rate on the user's phone. Representative of Google's product-side deployment of increasingly powerful generative models through mobile substrates where server-side inference is structurally impossible.
- LLM serving-infra latency / factuality primitives. The 2025-09-11 speculative-cascades post is the canonical latency instance — pedagogical unification of (cascades + speculative decoding) on a shared drafter-expert substrate, with a probabilistic acceptance rule that keeps semantically-equivalent drafts token-exact speculative decoding would throw away. The 2025-09-17 SLED post (NeurIPS 2024) is the canonical factuality instance at the same architectural insertion point — the LLM decoding step — reusing the final projection matrix on every transformer layer's hidden state to produce early-exit logits per layer, then weighted- averaging across all layers to dodge the "popular but wrong" final-layer frequency bias that drives a class of LLM hallucinations. Same insertion point, sibling objectives (latency vs factuality); both training-free, both serving-time, both composable with each other in principle. Representative of Google Research framing LLM-serving primitives themselves (not just the models that run on them) as first-class research output.
- Cascade pipeline redesign via intermediate-representation removal. The 2025-10-07 Speech- to-Retrieval (S2R) post is the canonical wiki instance — voice search's production-standard ASR → text → retriever cascade is re-architected to go directly from audio to retrieval results, removing the text transcript as intermediate representation. The architectural argument is quantified by a groundtruth-upper- bound benchmark (real Cascade ASR vs. Cascade groundtruth where the ASR step is replaced with human-transcribed perfect ASR), with WER on the ASR axis and MRR on the retrieval axis over Google's SVQ multilingual dataset, cross- validated with human raters. Named structural failure modes: information loss at the cascade boundary + error propagation from ASR into retrieval. Representative of Google Research's willingness to restructure production pipelines when the intermediate stage is the structural ceiling, not just the failing component.
- Reference silicon architecture for edge-ML at ambient-sensing power. The 2025-10-15 Coral NPU post is the canonical wiki instance — Google publishes a RISC-V-compliant reference NPU architecture that "reverses traditional chip design" by prioritising the ML matrix engine over scalar compute (ML-first architecture), targeting ~512 GOPS at a few milliwatts for the always-on ambient-sensing envelope (hearables, AR glasses, smartwatches). Framed explicitly as a full-stack platform — not just silicon — to heal the fragmented edge-ML ecosystem where each proprietary accelerator ships its own ISA / command buffer / compiler. Delivery shape is reference hardware for software ecosystem at the silicon-IP layer. Representative of Google's push to extend what the on-device substrate can run — the other side of the YouTube-distillation story (shrink the model to fit the substrate) applied to the same problem at ambient-sensing power.
- Moonshot infrastructure research — work backwards from the physics constraint. The 2025-11-04 Project Suncatcher post is the canonical wiki instance — Google Research announces a space-based AI-infrastructure moonshot where compact constellations of solar-powered satellites carry Google TPUs interconnected by free-space optical links. The framing inverts the usual scaling target: instead of pushing datacenters onto a terrestrial grid, the programme starts from "the Sun is the ultimate energy source in our solar system" (~100 trillion times human electricity production; solar panels ~8× more productive in orbit with near-continuous generation) and works backwards to the architecture. Three named foundational-research challenges: high-bandwidth inter-satellite communication, orbital dynamics, and radiation effects on computing. Load-bearing architectural stance is commodity TPUs, not radiation-hardened silicon — pushing radiation tolerance into architectural / software mitigation rather than silicon mitigation, to preserve access to the commercial process-node curve. Architectural shape is modular disaggregated constellation — many small interconnected satellites, not one monolithic orbital platform; analogous at the space layer to Borg's commodity-cluster- vs-supercomputer shape shift. Anchored in Google's moonshot lineage alongside quantum computing (~2015) and autonomous vehicles (~2010 → Waymo). Sibling to the dual economics/sustainability framing of the 2025-10-17 LAVA post — same motivator (efficient compute scaling under resource constraints) at two very different layers: squeeze the terrestrial cluster harder vs. move the cluster off-planet. Raw captures announcement + physics framing + three-challenge enumeration only; architectural depth lives in the preprint paper, not in the raw.
- Multi-agent LLM architectures with ablation-isolated primitives. The 2025-11-06 DS-STAR post is the canonical wiki instance — four specialised LLM sub-agents (Data File Analyzer, Planner, Coder, Verifier) plus a Router in an iterative plan- refinement loop bounded to 10 rounds. Ablations isolate which primitives are load-bearing: removing the Data File Analyzer collapses DABStep hard-task accuracy 45.2 % → 26.98 %; removing the Router (forcing extend-only refinement) degrades both easy and hard tasks. Load-bearing framing: "it is more effective to correct mistakes in a plan than to keep adding potentially flawed steps." Representative of Google Research publishing agent-architecture research where the ablations name the distinguishing primitives rather than only reporting top-line benchmark gains — a discipline shape more common on the systems-research side of Google than on the model-research side.
- Security-policy / responsible-disclosure posts for trust-sensitive substrates. The 2026-03-31 blockchain-quantum-disclosure post is the canonical wiki instance of this shape from Google Research. Announces a major quantum-algorithmic speed-up against ECDLP-256 (the primitive under Bitcoin's secp256k1 and TLS's P-256) and withholds the algorithm itself, substantiating only via zero-knowledge proof — canonical producer-side instance of patterns/zkp-capability-disclosure. Names the FUD attack surface as a first-class concern on substrates whose value depends on public confidence (cryptocurrency), extending the classical coordinated-disclosure norm with scope-clarification + defensive-progress-highlighting as in-disclosure FUD-reduction moves. Producer-side pair of Cloudflare's 2026-04-07 consumer-side post — Google publishes the disclosure primitive and philosophy; Cloudflare demonstrates the operational effect on industry-wide PQ migration timelines. Distinct shape from the ML-for-systems / agent-architecture / Borg-adjacent-scheduling / space-infra arcs — a policy-positioning post rooted in a cryptographic result.
Key systems¶
- systems/ds-star — Google Research's 2025-11-06 versatile data-science agent: four LLM sub-agents (Data File Analyzer, Planner, Coder, Verifier) plus a Router in an iterative plan-refinement loop (bounded to 10 rounds). Canonical instance of patterns/planner-coder-verifier-router-loop. State-of-the-art on DABStep, KramaBench, DA-Code; #1 on the DABStep public leaderboard as of 2025-09-18. Ablation-load-bearing primitives: the Data File Analyzer (removing it collapses DABStep hard-task accuracy 45.2 % → 26.98 %) and the Router's add-or-fix branch over a naive extend-only plan loop.
- systems/dabstep — primary benchmark surface for DS-STAR; heterogeneous-multi-file data-science tasks (easy = single-file, hard = multi-file). DS-STAR's #1-ranked leaderboard position (2025-09-18) on DABStep is the headline external-validation datapoint.
- systems/autogen — Microsoft Research's multi-agent framework; appears on this wiki as a baseline comparator in the DS-STAR evaluation (DS-STAR beats AutoGen + DA-Agent on all three named benchmarks).
- systems/project-suncatcher — Google Research's 2025-11-04 moonshot programme for space-based, scalable AI infrastructure: compact constellations of solar-powered satellites carrying Google TPUs, interconnected by free-space optical links. Architectural shape is modular disaggregated constellation. Three named foundational-research challenges: high-bandwidth inter-satellite communication; orbital dynamics; radiation effects on computing. Backing paper Towards a future space-based, highly scalable AI infrastructure system design. Announcement post only in the raw; architectural depth in the preprint.
- systems/google-tpu — Google's Tensor Processing Unit, the commercial AI-accelerator line (cloud.google.com/tpu); the compute element chosen to carry inside Suncatcher satellites. Minimal wiki page so far — future Google / Google Cloud posts will populate the architectural detail.
- systems/googlesql — Google's SQL dialect shared across
BigQuery, F1, Spanner, and other internal query engines. A single
dialect spec propagates syntax / semantic changes across the whole
Google query-engine fleet; the 2024-08-24 pipe-syntax paper uses
this as the lever to ship
|>across multiple engines with one grammar change. - systems/videoprism — Google's foundational visual encoder for video understanding (2024). Two-stage pretraining (contrastive text-video + masked modeling on unlabeled corpora); frozen encoder adapted to multiple downstream tasks without fine-tuning.
- systems/clip-embedding-model — OpenAI CLIP, referenced from Google research as a prior-art contrastive multimodal encoder.
- systems/openjdk-arrays-binarysearch — JDK's standard
java.util.Arrays.binarySearch(authored by Bloch), which shipped the(low + high) / 2integer-overflow bug for ~9 years; the concrete production-affected system in the 2025-01-11 Bloch republication. - systems/borg — Google's planet-scale cluster manager; the production target of the 2025-07-29 Regression Language Model work. MIPS-per-GCU (Millions of Instructions Per Second per Google Compute Unit) is its efficiency metric; a specialised bin-packing algorithm is what the RLM predicts; a Borg digital twin is the backtesting framework that generates training data.
- systems/regression-language-model — 60M-parameter two-layer encoder-decoder that reads Borg cluster state as YAML/JSON and emits the bin-packer's MIPS-per-GCU output as a decoded numeric string; multi-sample decoding recovers the full output distribution → point prediction + density + calibrated uncertainty in one model. Reported near-perfect Spearman rank correlation across diverse Borg regression tasks.
- systems/regress-lm — Google DeepMind's open-source companion library (https://github.com/google-deepmind/regress-lm) for training + serving Regression Language Models; positioned as reusable scaffolding for text-to-text regression beyond the Borg case study.
- systems/lava-vm-scheduler — Google Research's 2025-10-17 trio of lifetime-aware VM-allocation algorithms for Borg-adjacent cloud scheduling (backing paper arXiv:2412.09840v1). NILAS (Non-Invasive Lifetime Aware Scoring) — placement- scoring layer over an existing allocator; LAVA (Lifetime-Aware VM Allocation) — full allocation algorithm using learned lifetime distributions with misprediction adaptation; LARS (Lifetime-Aware Rescheduling) — post-placement rescheduling on updated predictions. Load-bearing primitive is continuous reprediction — the model updates its estimate of a VM's expected remaining lifetime as the VM continues to run, replacing the naive one-shot prediction whose structural hazard is that "a single misprediction can tie up an entire host for an extended period." Targets the two named operational failure modes resource stranding and empty-host loss.
- systems/programming-pearls — Jon Bentley's 1986/2000 algorithms book; the upstream source for the broken-binary-search idiom propagated across the industry.
- systems/google-maps — Google's planet-scale mapping / routing / ETA product. 2025-06-30 HOV-specific ETA launch is the canonical wiki instance so far: dedicated ETA surface for routes including carpool lanes, selected at serving time by a trip classifier.
- systems/android-earthquake-alerts — planet-scale earthquake early-warning (EEW) system using the Android device fleet's accelerometers as a distributed seismometer network. 2025-07-17 post reports three-year MAE halving (0.50 → 0.25 on moment-magnitude) in first-magnitude-estimate accuracy, claimed "similar to and sometimes better than" traditional dedicated seismic networks.
- systems/youtube-real-time-generative-ai-effects — YouTube's on-device real-time generative AI effects surface; camera-frame-rate stylised effects delivered via a distilled student model (UNet + MobileNet) on user phones. Teacher stack evolved StyleGAN2 + StyleCLIP → Imagen. Canonical-wiki instance of on-device ML inference at YouTube scale.
- systems/stylegan2 — NVIDIA's 2019 high-fidelity GAN architecture; used as the first-generation teacher for YouTube's generative-effects distillation pipeline.
- systems/styleclip — 2021 technique adding text-driven facial-feature manipulation over StyleGAN-class models; the teacher-side controllability layer in YouTube's first-gen effects pipeline.
- systems/imagen-google-deepmind — Google DeepMind's text-to-image diffusion model; the second-generation teacher in YouTube's on-device effects pipeline, delivering higher fidelity and broader style range without requiring student re-architecture.
- systems/mobilenet — Google's efficient mobile-first CNN architecture family; depthwise-separable convolutions; the student-backbone + decoder-block primitive in YouTube's on-device effects pipeline; canonical wiki instance of "architecture selected by serving substrate, not by training task."
- systems/unet-architecture — Ronneberger et al. 2015 encoder-decoder architecture for image-to-image tasks; the student's top-level topology for YouTube's on-device generative AI effects; also the denoising-network backbone inside modern diffusion teachers like Imagen.
- systems/speculative-cascades — Google Research's 2025-09-11 hybrid LLM-inference technique composing cascades and speculative decoding on the same drafter-expert substrate; keeps speculative decoding's parallel-verify primitive but uses a probabilistic acceptance rule so semantically-equivalent drafts aren't thrown away on token-exact mismatch.
- systems/sled — Google Research's 2025-09-17 factuality-decoding method (NeurIPS 2024, arXiv:2411.02433, open-source github.com/JayZhang42/SLED). Reuses the transformer's final projection matrix on every layer's hidden state to obtain early-exit logits, then weighted-averages across all layers to pick the next token. Up to +16 percentage points accuracy vs base / DoLa on "two challenging datasets"; ~4% decoding-time overhead vs DoLa; cross-family-validated on Gemma 3 / GPT-OSS / Mistral (instruction-tuned + base). Canonical-wiki instance of all-layer ensemble decoding and of decoding-time remediation of LLM hallucination without retraining / retrieval.
- systems/dola — the prior-SOTA factuality-decoding baseline (arXiv:2309.03883, voidism/DoLa); pairwise- contrast between a mature layer and a premature layer. The comparator SLED's +16pp / +4% overhead numbers are anchored against.
- systems/coral-npu — Google Research's 2025-10-15 reference neural-processing-unit architecture for low-power on-device ML. RISC-V-ISA-compliant architectural IP blocks (not a chip); ~512 GOPS at a few milliwatts; target device classes: edge devices, hearables, AR glasses, smartwatches. ML-first architecture: matrix engine prioritised over scalar compute. Delivered as a full-stack platform to heal the fragmented edge-ML ecosystem; distinct from Google's existing Edge-TPU-based Coral product boards (relationship not explicitly decomposed in the announcement post).
Key patterns / concepts¶
- patterns/planner-coder-verifier-router-loop — DS-STAR's canonical architectural shape: four specialised LLM sub-agents (Planner, Coder, Verifier, Router) in a verification-gated inner loop, with the Router's add-or-fix branch as the distinguishing primitive over extend-only refinement loops. Bounded by a refinement-round budget (10 rounds in DS-STAR; 3.0 avg on easy DABStep tasks, 5.6 on hard).
- concepts/iterative-plan-refinement — the loop-level discipline the DS-STAR pattern implements: plan → implement → judge → add-or-fix → repeat. Named against two failure modes: one-shot plan-and-generate (no verification) and extend-only refinement (accumulates mistakes rather than repairing them).
- concepts/data-file-analysis — DS-STAR's ablation-load-bearing pre-loop primitive: agent writes + runs a file-summarisation script to give the Planner rich textual context on every file (CSV, JSON, markdown, unstructured text) in the working directory. Remove it, DABStep hard-task accuracy collapses 45.2 % → 26.98 %.
- concepts/heterogeneous-data-formats — the real-world mixed-format data class DS-STAR's Data File Analyzer targets, explicitly named against "heavy reliance on well-structured data, like CSV files in relational databases."
-
concepts/refinement-round-budget — the bounded-iteration discipline the DS-STAR loop terminates against. Canonical numeric anchor: 10-round ceiling, 3.0 / 5.6 avg rounds on easy / hard DABStep tasks, >50 % of easy tasks finishing in 1 round. Extends the wiki's concepts/llm-as-judge page with the plan-sufficiency-in-the-generation-loop axis (vs. the earlier wiki instances which score trajectories or outputs post-hoc in eval harnesses).
-
concepts/space-based-compute — the architectural-class concept Project Suncatcher instantiates: deploy the compute substrate in orbit rather than terrestrially, motivated by continuous solar power (~8× panel productivity, near-continuous generation), unbounded energy-supply ceiling (~100T× humanity's electricity), and terrestrial-resource minimisation.
- concepts/free-space-optical-communication — the inter-satellite network-fabric choice inside Suncatcher; optical vs. RF at AI-workload bandwidths.
- concepts/radiation-effects-on-computing — failure-mode class introduced by commodity silicon in space; SEU / SEL / TID / displacement damage. One of three foundational Suncatcher challenges; mitigation pushed into architectural / software layer rather than silicon.
-
patterns/modular-disaggregated-constellation — the architectural-shape pattern Suncatcher adopts: many small interconnected satellites, not one monolithic orbital platform; analogous at the space layer to Borg's commodity-cluster shape.
-
concepts/ml-first-architecture — Google's Coral NPU is the canonical wiki instance of reversing the traditional scalar-CPU-first chip-design precedence; the ML matrix engine is the primary compute element, scalar secondary.
- concepts/always-on-ambient-sensing — the milliwatt-class-sustained serving envelope that Coral NPU targets (hearables, AR glasses, smartwatches, always-on IoT).
- concepts/fragmented-hardware-software-ecosystem — the edge-ML ecosystem trap (proprietary ISAs / compilers / command buffers per accelerator vendor) that Coral NPU's RISC-V-based reference-architecture shape is pitched against.
-
patterns/reference-hardware-for-software-ecosystem — the delivery-shape pattern Coral NPU instantiates at the silicon-IP layer; sibling of Home Assistant Green / Voice Assistant Preview Edition one layer of the stack down.
-
concepts/language-extension-vs-replacement — Google Research's pipe-syntax paper is the canonical wiki production instance of this design stance: extend SQL from within rather than propose a replacement language that can't achieve ecosystem adoption.
- patterns/pipe-syntax-query-language — the query-language surface shape itself; pipe stages as named existing operators in data-flow order.
- patterns/two-stage-pretraining-contrastive-then-masked — the VideoPrism pretraining recipe.
- patterns/multimodal-content-understanding — pre-trained visual encoders producing embeddings usable by downstream retrieval / classification / QA.
- concepts/vector-embedding — the shared-vector-space property both VideoPrism and CLIP produce.
- concepts/training-serving-boundary — the explicit separation Google research posts tend to surface when they describe foundation models intended for multi-task serving.
- concepts/integer-overflow — the bug substrate in the 2025-01-11
Bloch republication; binary-search / mergesort midpoint
computation overflowing
intfor arrays ≥2^30elements. - concepts/program-correctness — the framing; proof + test + review insufficient; "correct under inputs we tested ≠ correct under all inputs"; the scale-makes-latent-bugs-manifest pattern Bloch's post is the canonical example of.
- concepts/unsigned-right-shift — Java's
>>>operator; the language primitive that makes(low + high) >>> 1recover the correct midpoint even when the sum overflowed. - patterns/safe-midpoint-computation — the canonical fix shape;
low + ((high - low) / 2)or(low + high) >>> 1; covers binary search, mergesort, quicksort partitioning, everywhile (low <= high)divide-and-conquer loop body. - patterns/invariant-driven-programming — the discipline Bentley
taught and Bentley's own binary search violated: name the
invariants your code relies on, write them down, check them
against substrate semantics (fixed-width
int, IEEE 754, TCP, eventual consistency, …) not intended semantics. - concepts/estimated-time-of-arrival — the prediction surface Google Maps specialises by trip category; HOV ETAs are the canonical-wiki example of splitting the surface rather than folding categories into one model.
- concepts/hov-lane — high-occupancy-vehicle (carpool) lanes; the traffic-engineering primitive whose speed advantage (Utah: ~16% faster than general lanes at peak) motivated the HOV-ETA feature.
- patterns/trip-classification — classify trips into discrete categories, then serve a category-specific model / ETA / route policy. Google Maps HOV-ETA launch is the canonical instance.
- concepts/earthquake-early-warning — the real-time seismic-detection + geo-scoped alerting problem class that AEA is a canonical production instance of.
- concepts/speed-accuracy-tradeoff — AEA's framing of magnitude estimation ("the first few seconds provide limited data, but every second you wait is a second less of warning") is the canonical wiki articulation of the real-time-decision speed-vs-accuracy trade-off.
- concepts/text-to-text-regression — the RLM post's core technique; numeric prediction done by a language model reading (x) as a string and emitting (y) as a decoded string, trained with next-token-prediction. Sidesteps feature engineering on complex unstructured state. Generalises Google's earlier OmniPred (2024).
- concepts/performance-prediction — the problem class the RLM addresses: estimate a system's performance metric from its state without running the authoritative solver. Borg MIPS-per- GCU is the canonical wiki production instance.
- concepts/digital-twin-backtesting — Google's Borg digital twin is the wiki's canonical instance; a backtesting framework that replicates real cluster state, used both as training-data source for the RLM and as the authoritative fallback for high-uncertainty predictions.
- concepts/uncertainty-quantification — the RLM's sampled- decoder mechanism produces calibrated uncertainty (predicted- distribution width correlates with residual squared error), with the two failure-mode flavours named explicitly: aleatoric (stochastic load demand) + epistemic (limited observation).
- concepts/bin-packing — the combinatorial resource- allocation primitive at the heart of Borg's scheduler; the authoritative solver whose output the RLM predicts. The 2025-10-17 LAVA post extends the Google Research wiki footprint on bin-packing to the online-with-unknown-disappearance- times variant — Tetris with vanishing pieces — where lifetime uncertainty is the structural feature the LAVA / NILAS / LARS trio targets.
- concepts/vm-lifetime-prediction — the problem class the LAVA family operationalises: predict how long a VM will run as an input to placement scoring. Subclass of performance prediction distinguished by the asymmetric cost of misprediction — "a single misprediction can tie up an entire host for an extended period."
- concepts/continuous-reprediction — the load-bearing primitive of the LAVA family: update the lifetime prediction as the VM runs rather than committing to a single estimate at creation. Makes early-stage misprediction recoverable at later-stage prediction windows; generalises to any online-ML- for-systems setting where the decision is expensive to reverse and observational evidence accumulates.
- concepts/learned-lifetime-distribution — emit a distribution (not a point) over remaining lifetime so downstream consumers can reason about tail risk and confidence. The "Learned Distributions" in the LAVA paper's title.
- concepts/resource-stranding — cluster-level failure mode named by the LAVA post: server's remaining resources too small or unbalanced to host any candidate VM.
- concepts/empty-host — the operational primitive the cluster scheduler must preserve for maintenance + large-VM provisioning; eroded by naive density-maximising packing.
- patterns/token-limit-aware-feature-prioritization — the RLM's pre-processing step: reorder features by importance so the model's 8k-token truncation drops only the least- important tail of up to 1M candidate tokens. Generalises to any LLM-over-large-context application.
- patterns/cheap-approximator-with-expensive-fallback — the RLM's deployment shape: trust the fast ML approximator when confident, fall back to the slow bin-packing simulator when uncertain. Calibrated uncertainty is load-bearing for the fallback threshold to be meaningful.
- patterns/lifetime-aware-rescheduling — the LARS component generalised: continue tracking the lifetime distribution after placement and migrate when the updated picture makes the existing placement inefficient. Sibling of cheap- approximator-with-fallback at the cluster-scheduler layer (different "expensive action" — migrate, not call slow solver).
- patterns/learned-distribution-over-point-prediction — the representation-side pattern both the RLM (sampled decodes) and LAVA (learned parametric / quantile lifetime distributions) instantiate; calibration is load-bearing.
- concepts/knowledge-distillation — teacher / student transfer of knowledge from a large offline model into a small on-device one; canonical wiki production instance is YouTube's real-time generative AI effects pipeline (StyleGAN2 / Imagen → UNet+MobileNet student).
- concepts/on-device-ml-inference — running ML inference on end-user hardware rather than on cloud servers; the deployment target that makes distillation economically attractive at YouTube scale. Student-architecture choice (MobileNet, depthwise-separable convs) is dictated by the substrate, not by the task.
- patterns/teacher-student-model-compression — the engineering pattern that wraps distillation into a production deployment shape: teacher offline, student online, teacher-side upgrades absorbed without re-architecting the serving stack. Canonical wiki instance: YouTube real-time generative AI effects. Contrasts with cheap-approximator-with-expensive-fallback on one load-bearing axis — whether the expensive reference is reachable at serving time (datacentre: yes; phone: no).
- concepts/speculative-decoding — small-model drafter proposes N tokens, large-model expert verifies them in a single parallel forward pass; token-exact rejection in the canonical form, probabilistic-match in the generalisation.
- concepts/cascades-llm-inference — small-model drafter answers first, expert takes over from scratch on low-confidence; sequential by construction, the limitation speculative cascades' parallel-verifier hybrid resolves.
- concepts/drafter-expert-split — two-model LLM-serving architectural substrate under both cascades and speculative decoding.
- concepts/token-verification — the per-position accept/reject primitive made cheap by parallel-populate of the KV cache.
- patterns/draft-verify-inference — generalised cheap-generator / expensive-verifier pattern at the LLM-token granularity; per-token cousin of patterns/cheap-approximator-with-expensive-fallback at the per-query granularity.
- concepts/llm-decoding-step — the final phase of LLM text generation where internal representations become tokens; the architectural insertion point both Google's latency-decoding (speculative cascades) and factuality-decoding (SLED) work modifies. Canonical wiki articulation that "latency decoding" and "factuality decoding" are sibling categories at the same insertion point.
- concepts/factuality-decoding — decode-time interventions that improve LLM factual accuracy without retraining or retrieval; training-free, retrieval-free, serving-time. Google Research SLED + DoLa are the two canonical instances.
- concepts/llm-hallucination — the failure mode factuality decoding targets; "popular but wrong" completions driven by training-data-frequency bias at the final layer while the correct signal sits in intermediate layers. Canonical wiki worked examples: "capital of British Columbia" → Vancouver (wrong) vs Victoria (correct); discount-arithmetic word problem → "6 × 10 =" (wrong) vs "6 × 10 ×" (correct).
- concepts/logits — pre-softmax prediction scores over the vocabulary, emitted per-layer by the transformer; the primitive factuality decoders (SLED, DoLa) operate on.
- concepts/early-exit-logits — logits derived by applying the transformer's final projection matrix to an intermediate layer's hidden state; SLED's lever. No per-layer heads trained, no new parameters — the same LM-head matrix is reused across layers, so extracting intermediate-layer distributions is training-free.
- patterns/all-layer-ensemble-decoding — the pattern SLED instantiates: reuse the final projection matrix on every layer's hidden state, weight-average the resulting per-layer distributions, decode from the mixture. Generalises DoLa's pairwise-layer contrast to a full-layer-ensemble. Orthogonal to other decoding-time interventions (composable); training-free.
Recent articles¶
-
2026-03-31 — sources/2026-03-31-google-safeguarding-cryptocurrency-by-disclosing-quantum-vulnerabilities-responsibly (Safeguarding cryptocurrency by disclosing quantum vulnerabilities responsibly — Google Research's disclosure-methodology landmark for the 2026 quantum-algorithmic speed-up against ECDLP-256 (the elliptic-curve discrete-log primitive under Bitcoin's secp256k1 and TLS's P-256). Extends classical responsible disclosure to trust-sensitive substrates (cryptocurrencies) by naming the FUD attack surface — "unscientific and unsubstantiated resource estimates for quantum algorithms breaking ECDLP-256 can themselves represent an attack on the system" — as a first- class disclosure concern. Three-component composed disclosure (patterns/zkp-capability-disclosure): (1) withhold the quantum circuits, substantiate via ZKP ("state-of-the-art cryptographic construction… allows third parties to verify our claims without us leaking sensitive attack details"); (2) scope clarification — "clarify the areas where blockchains are immune to quantum attacks"; (3) defensive-progress highlighting — "highlight the progress that has already been achieved towards post-quantum blockchain security." Cites CERT/CC, Project Zero, ISO/IEC 29147:2018 as classical-disclosure-norm precedents and explicitly declares blockchain disclosure new policy territory ("we welcome further discussions with the quantum, security, cryptocurrency, and policy communities to align on responsible disclosure norms going forward"). Producer-side companion to Cloudflare's 2026-04-07 consumer-side post: Google publishes the disclosure primitive + philosophy; Cloudflare demonstrates the operational effect (community-wide Q-Day timeline compression, migration-target pull-forward to 2029). The pair brackets the full production shape: ZKP capability disclosure → trusted signal → industry-wide timeline reassessment → accelerated PQC deployment. Introduces concepts/fud-attack-surface; patterns/zkp-capability-disclosure; extends concepts/zero-knowledge-proof (producer-side framing of ZKP as responsible-disclosure primitive), concepts/coordinated-disclosure (adds the trust-sensitive-substrate variant + Google 2026 Seen-in), concepts/post-quantum-cryptography (adds producer- side disclosure-methodology source for 2026 Q-Day pull-forward), concepts/q-day (adds producer-side source for the 2026 pull- forward), concepts/cryptographically-relevant-quantum-computer (adds producer-side disclosure for the Google algorithmic speed- up). Caveats: raw capture is the "Our approach to vulnerability disclosure" section only — updated ECDLP-256 resource estimates and specific PQ-progress examples are not in the ingested portion; ZKP construction not specified (SNARK/STARK/etc.); no specific cryptocurrency named as most-affected; no governance body / venue / embargo discipline proposed for future ZKP-capability disclosures. (Tier 1)
-
2026-02-11 — sources/2026-02-11-google-scheduling-in-a-changing-world-time-varying-capacity (Scheduling in a changing world: Maximizing throughput with time-varying capacity — Google Research paper on online throughput-maximising scheduling under a time-varying machine-capacity profile, with competitive-ratio analysis across three preemption regimes. Non-preemptive competitive ratio approaches zero — a single long-job commitment can starve arbitrarily many future short jobs. Interrupt-and-restart preemption recovers the offline ½-competitive bound via a earliest-finish- job greedy (matches the offline optimum up to factor 2). Interrupt-without-restart is adversarially unwinnable in general but becomes constant-competitive under common deadlines (motivated by "all data processing must finish by the nightly batch run"). The common-deadline constant- competitive algorithm maintains a tentative schedule over already-arrived jobs and revises it on each new arrival via a fixed four-action rule (unit-capacity variant; full four-action specification is in the paper but not the raw capture). Third Google Research proof point on Borg-adjacent scheduling at a distinct intervention layer from the 2025-07-29 RLM (bin-packer output prediction) and 2025-10-17 LAVA / NILAS / LARS (VM-allocation policy) — now the online-throughput scheduling-theory layer. Caveat: research-side theoretical paper with production-shape motivation; no deployment retrospective, no production numbers, and the raw capture covers only the online-setting section. (Tier 1)
-
2025-11-06 — sources/2025-11-06-google-ds-star-versatile-data-science-agent (DS-STAR: A state-of-the-art versatile data science agent — Google Research introduces DS-STAR, a data-science agent built from four specialised LLM sub-agents (Data File Analyzer, Planner, Coder, Verifier) plus a Router in an iterative plan- refinement loop, canonical instance of patterns/planner-coder-verifier-router-loop. Architecturally distinguishing primitives: (1) up-front Data File Analyzer agent writes + runs a Python file-summarisation script so the Planner has rich context on heterogeneous data formats (CSV, JSON, markdown, unstructured text) — ablation-load-bearing: removing it collapses DABStep hard-task accuracy from 45.2 % to 26.98 %; (2) Verifier as an LLM judge on plan sufficiency inside the generation loop (not post-hoc trajectory scoring); (3) Router's add-or-fix decision over a naive extend-only loop — ablation shows forcing extend-only degrades both easy and hard tasks ("it is more effective to correct mistakes in a plan than to keep adding potentially flawed steps"). Mimics the expert-analyst-in-a-notebook workflow: "builds a plan sequentially, reviewing intermediate results before proceeding." Round-count budget ceiling 10; DABStep hard tasks avg 5.6 rounds, easy tasks 3.0, >50 % of easy tasks in 1 round. State-of-the-art on three benchmarks — DABStep 41.0 % → 45.2 % (+4.2), KramaBench 39.8 % → 44.7 % (+4.9), DA-Code 37.0 % → 38.5 % (+1.5) — beating AutoGen and DA-Agent baselines. #1 on the DABStep public leaderboard as of 2025-09-18. Framework is LLM-swappable: tested with Gemini-2.5-Pro (default) and GPT-5; both work (GPT-5 slightly better on easy, Gemini-2.5-Pro better on hard). Backing paper arXiv 2509.21825. Extends concepts/llm-as-judge along the plan-sufficiency-in- the-generation-loop axis, and [[patterns/specialized-agent- decomposition]] with a third decomposition framing — role-in- the-refinement-loop decomposition alongside the Storex (domain-based) and Dash (sub-tool) framings. Introduces systems/ds-star, systems/dabstep, systems/autogen; concepts/iterative-plan-refinement, concepts/data-file-analysis, concepts/heterogeneous-data-formats, concepts/refinement-round-budget; patterns/planner-coder-verifier-router-loop. Research, not production; no Google Cloud productisation announced, no latency / throughput / cost numbers disclosed.)
-
2025-11-04 — sources/2025-11-04-google-exploring-space-based-scalable-ai-infrastructure (Exploring a space-based, scalable AI infrastructure system design — Google Research announces Project Suncatcher, a moonshot research programme for a space-based, scalable AI infrastructure: compact constellations of solar-powered satellites carrying Google TPUs, interconnected by free-space optical links. Framing: "the Sun is the ultimate energy source… more than 100 trillion times humanity's total electricity production; in the right orbit, a solar panel can be up to 8 times more productive than on earth, and produce power nearly continuously, reducing the need for batteries." Architectural shape is modular disaggregated constellation — "compact constellations of… smaller, interconnected satellites" rather than one monolithic orbital platform — pitched as "highly scalable". Three named foundational-research challenges: (1) high-bandwidth communication between satellites (FSO link design at AI-workload bandwidths); (2) orbital dynamics (tight-formation constellation geometry); (3) radiation effects on computing (commodity TPU operation in a space- radiation environment; mitigation pushed into architectural / software layer rather than silicon). Load-bearing stance: commodity TPUs, not radiation-hardened silicon — keeps the constellation on the commercial-compute density curve. Also minimises terrestrial resources — sibling of the dual economics/sustainability framing of the 2025-10-17 LAVA post. Anchored in Google's moonshot lineage: quantum computing (~2015) and autonomous vehicles (~2010 → Waymo). Backing preprint: Towards a future space-based, highly scalable AI infrastructure system design. Raw captures announcement + physics motivation + three-challenge enumeration + moonshot lineage only — no satellite count, no per-satellite TPU count, no FSO link-budget numbers, no constellation orbit, no deployment timeline; all live in the preprint paper, not the raw. Research moonshot, not production)
-
2025-10-17 — sources/2025-10-17-google-solving-virtual-machine-puzzles-lava (Solving virtual machine puzzles: How AI is optimizing cloud computing — Google Research introduces a trio of lifetime-aware VM-allocation algorithms for the online bin-packing problem at the heart of cloud scheduling (backing paper arXiv:2412.09840v1): NILAS / LAVA / LARS — scoring, allocation, and rescheduling layers respectively. Problem framing: VM allocation is Tetris-with-disappearing- pieces whose disappearance times are unknown at placement. Named structural failure mode of naive single-shot ML lifetime prediction: "a single misprediction can tie up an entire host for an extended period, degrading efficiency." Load-bearing primitive is continuous reprediction — the model "constantly and automatically updates its prediction for a VM's expected remaining lifetime as the VM continues to run", so early-stage mispredictions are recoverable at later-stage prediction windows. Two named cluster-level operational failure modes are pitched as the family's targets: resource stranding (remaining per-host resources too small or unbalanced to host new VMs) and empty- host loss (reduces the cluster's reserve of fully-empty hosts needed for maintenance + large-VM provisioning). Second Google Research ML-for-systems proof point on Borg-adjacent scheduling after the 2025-07-29 RLM post, at a different layer — RLM predicts the bin-packer's output so the inner loop can skip the slow solver; LAVA augments the bin-packer's policy with learned lifetime distributions. Raw captures intro-and-algorithm-naming only; internal NILAS / LAVA / LARS mechanisms, measured production savings, and Borg integration specifics live in the arXiv paper, not the raw)
-
2025-10-15 — sources/2025-10-15-google-coral-npu-a-full-stack-platform-for-edge-ai (Coral NPU: A full-stack platform for Edge AI — Google Research introduces Coral NPU, a reference neural-processing-unit architecture built from RISC-V-ISA- compliant architectural IP blocks (not a chip); targets ~512 GOPS at a few milliwatts for always-on ambient sensing — edge devices, hearables, AR glasses, smartwatches. Architectural claim: "reverses traditional chip design… it prioritizes the ML matrix engine over scalar compute" — ML-first architecture. Pitched as a full-stack platform to heal the fragmented edge-ML ecosystem where each proprietary accelerator ships its own ISA / compiler / command buffer. Delivery shape is reference hardware for software ecosystem at the silicon-IP layer. Raw capture ends mid-paragraph after the opening claims — IP block decomposition, compiler toolchain, ML-framework support matrix, per-model benchmarks, and production partners are not in scope of this source page. Relationship to the existing Edge-TPU-based Coral product line not explicitly decomposed)
-
2025-10-07 — sources/2025-10-07-google-speech-to-retrieval-s2r-voice-search (Speech-to-Retrieval (S2R): A new approach to voice search — Google Research re-architects voice search from the production-standard ASR → text → retriever cascade to a direct audio→retrieval path, removing the text transcript as intermediate representation; named structural failure modes are information loss and error propagation; quantified by a groundtruth-upper-bound benchmark (real Cascade ASR vs. human-transcribed "perfect ASR" cascade, same retriever), measured with WER ↔ MRR on Google's SVQ multilingual dataset, cross- validated by human raters. Raw captures framing + benchmark design only; S2R model architecture, per-language result numbers, and production rollout details not in raw)
-
2025-09-17 — sources/2025-09-17-google-sled-making-llms-more-accurate-by-using-all-of-their-layers (Making LLMs more accurate by using all of their layers — Google Research introduces SLED (Self Logits Evolution Decoding, NeurIPS 2024, arXiv:2411.02433, github.com/JayZhang42/SLED): a factuality-decoding method that reuses the transformer's final projection matrix on every layer's hidden state to obtain early-exit logits, then weighted-averages across all layers to pick the next token; framed as a "factuality decoding" sibling of speculative decoding at the same decoding-step insertion point; up to +16 percentage points accuracy over base / DoLa on "two challenging datasets", ~4% decoding-time overhead vs DoLa, validated across Gemma 3 / GPT-OSS / Mistral (instruction-tuned + base); composable with other factuality decoders; training-free, retrieval-free, serving-time; canonical worked examples — "capital of British Columbia" (final layer prefers Vancouver, intermediate layers prefer Victoria) and discount-arithmetic word problem (final layer follows "A × B = C" frequency pattern, intermediate layers preserve discount context). One week after the 2025-09-11 speculative-cascades post; together they establish the "LLM serving-infra latency / factuality primitives" recurring shape)
-
2025-09-11 — sources/2025-09-11-google-speculative-cascades-hybrid-approach-llm-inference (Speculative cascades: A hybrid approach for smarter, faster LLM inference — pedagogical walkthrough of cascades + speculative decoding as the two baseline LLM-inference-acceleration primitives, plus the speculative cascades hybrid that keeps speculative decoding's parallel-verify pass while generalising token-exact rejection to a probabilistic-match rule so semantically-equivalent small-model drafts aren't thrown away; "Who is Buzz Aldrin?" worked example motivates both failure modes; raw captures intro only — speed-up numbers, full probabilistic-match specification, and any production deployment detail live in the unscraped body and the linked paper)
- 2025-08-21 — sources/2025-08-21-google-from-massive-models-to-mobile-magic-tech-behind-youtube-real-time-generative-ai (From massive models to mobile magic: The tech behind YouTube real-time generative AI effects — production teacher-student distillation pipeline for YouTube's on-device generative AI camera effects; teacher upgraded StyleGAN2 + StyleCLIP → Imagen without re-architecting the on-device student (UNet with MobileNet encoder + MobileNet-block decoder); canonical wiki instance of on-device ML inference at consumer-app scale and teacher-student model compression as a deployment pattern; raw captures intro only, runtime / quantisation / device-matrix details not in raw)
- 2025-07-29 —
sources/2025-07-29-google-simulating-large-systems-with-regression-language-models
(Simulating large systems with Regression Language Models —
text-to-text regression with a 60M-parameter two-layer encoder-
decoder RLM applied to Borg performance
prediction; reads YAML/JSON cluster state (up to 1M candidate
tokens truncated to the 8k window by importance-ordering) and
emits MIPS-per-GCU as a decoded numeric string; multi-sample
decoding recovers full output distribution with predicted-
distribution width correlated with residual squared error so
the fast ML path can fall back to the slow bin-packing simulator
inside the Borg digital twin on high-uncertainty inputs; near-
perfect Spearman rank correlation across diverse tasks; open-
source
regress-lm; raw file captured only acknowledgements, body pulled from URL) - 2025-07-17 — sources/2025-07-17-google-android-earthquake-alerts (Android Earthquake Alerts: A global system for early warning — Android fleet as planet-scale distributed seismometer network for EEW; median absolute error of first magnitude estimate improved 0.50 → 0.25 over three years; accuracy "similar to and sometimes better than" traditional seismic networks; framed as a canonical speed-vs-accuracy real-time-decision trade-off; raw captures intro only, deeper architecture not in raw)
- 2025-06-30 — sources/2025-06-30-google-hov-specific-etas-google-maps (How we created HOV-specific ETAs in Google Maps — launch of HOV-specific ETA surface in Maps; classification system determines HOV vs. non-HOV trips, per-category ETA then served; Utah Salt Lake Valley cited at 68.18 vs. 58.60 mph (~16%) HOV vs. general-lane peak speed; raw captures intro only, deeper architecture not in raw)
- 2025-01-11 —
sources/2025-01-11-google-nearly-all-binary-searches-and-mergesorts-are-broken-2006
(Extra, Extra — Read All About It: Nearly All Binary Searches and
Mergesorts are Broken — Google Research republication of Joshua
Bloch's 2006 post;
(low + high) / 2integer-overflow bug in JDKArrays.binarySearchand Bentley's Programming Pearls; ~20 years dormant until arrays crossed2^30elements; meta-lesson: proof + test + review + static analysis each insufficient, "program carefully, defensively, and remain ever vigilant"; HN 164) - 2024-08-24 —
sources/2024-08-24-google-pipe-syntax-in-sql
(SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL —
GoogleSQL gains native
|>pipe syntax as an additive extension; preserves the full SQL ecosystem; canonical instance of language-extension-over-replacement; HN 308) - 2024-05-09 — sources/2024-05-09-google-videoprism-foundational-visual-encoder (VideoPrism: A foundational visual encoder for video understanding — two-stage contrastive-then-masked pretraining on 36M captioned + 582M noisy-text video clips; frozen encoder adapted to 30+ downstream tasks; HN 106)
Skipped (logged)¶
One Google Research article has been filtered out so far:
- 2025-09-16 Learn Your Way: Reimagining textbooks with generative AI — product/UX AI education announcement without distributed- systems / serving-infra content; raw capture was also near-empty (frontmatter + Acknowledgements only). Skipped per AGENTS.md scope filter (product/UX post + pure-ML-without-serving-infra).
The filter rules per AGENTS.md: pure ML research without a serving-infra or production-system angle is skipped; product-PR / marketing roundups are skipped; distributed-systems internals / scaling trade-offs / infra architecture / query-engine / language-design / LLM-serving-primitive content is ingested eagerly.
Backlog¶
Raw Google articles: 81 pending / 124 downloaded per companies index (counter stale relative to batch-processor flips). Current full-coverage ingests: 13 (most recent: 2025-11-04 Project Suncatcher space-based AI infrastructure moonshot). Wiki coverage is expected to deepen substantially as the backlog is ingested — current full-coverage (13 ingested sources) spans language design, video understanding, program-correctness retrospective, HOV-ETA, Android earthquake alerts, the 2025-07-29 Regression Language Model / Borg post, the 2025-08-21 YouTube real-time generative AI effects (on-device distillation) post, the 2025-09-11 speculative-cascades LLM-inference post, the 2025-09-17 SLED factuality-decoding post, the 2025-10-07 Speech-to-Retrieval voice-search post, the 2025-10-15 Coral NPU edge-AI reference-architecture post, the 2025-10-17 LAVA / NILAS / LARS lifetime-aware VM-allocation post, and the 2025-11-04 Project Suncatcher space-based AI infrastructure moonshot — not yet representative of Google Research's full scope.