PATTERN Cited by 2 sources

Learned distribution over point prediction¶

When downstream decisions are cost-asymmetric in the prediction error, emit a calibrated distribution from the predictor rather than a point estimate. The consumer picks the quantile matched to the asymmetry; tail queries become well-defined; uncertainty-gated actions become possible.

Intent¶

Point prediction (ŷ = f(x)) is appealing because it's simple and the consumer API is a number. It fails when either:

The consumer's loss function is asymmetric in error sign (e.g. under-prediction costs more than over-prediction).
The consumer's decision depends on a tail quantity (e.g. "is there a 10% chance this exceeds threshold T?").
The consumer wants to gate expensive actions on prediction confidence (trigger fallback / migration / human review only when the predictor is uncertain).

In any of these regimes, point prediction destroys information the predictor could cheaply have provided. Emitting a distribution preserves the information at marginal cost; the consumer does the quantile / tail / confidence extraction it needs.

Mechanism¶

Pick a representation:

Parametric. Train the predictor to emit parameters of a fixed family (Gaussian μ, σ; lognormal μ, σ; mixture of K Gaussians). Cheap and compact; brittle if the true distribution doesn't match the family.
Quantile regression. Predict a fixed set of quantiles (P10, P50, P90, or denser) directly. Nonparametric in shape; costs one output head per quantile; calibration is per- quantile.
Histogram / discretised output. Predict a probability distribution over bucketed target values. Maps well to classification-style training; bucketing loses precision.
Sampling-based. Treat the predictor as a generative model; sample N outputs per input; distribution is the empirical sample distribution. Expensive at inference (N-sample cost); natural for LLM-style text-to-text regression (see RLM).

Whichever representation, calibration is load-bearing: P(actual ≤ predicted q-quantile) should match q. Without calibration the downstream quantile queries lie, and the pattern degrades to decorative uncertainty.

Canonical wiki instances¶

Learned lifetime distributions in the LAVA VM scheduler. Google's 2025-10-17 LAVA post is titled "Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions" — the pattern is headline (Source: sources/2025-10-17-google-solving-virtual-machine-puzzles-lava). The consumer (scheduler, LARS) uses distribution tails to reason about empty-host preservation and migration-worth tests. A point estimate of "4h" can't answer "P(still running at T+24h)?"; the learned distribution can.
RLM-sampled distributions for MIPS-per-GCU prediction on Borg. The 2025-07-29 Google RLM post recovers a distribution via sampled decoding — the predictor emits tokens, many samples, the sample-distribution is the prediction distribution (Source: sources/2025-07-29-google-simulating-large-systems-with-regression-language-models). Used downstream to gate fallback to the slow bin-packer via cheap-approximator-with-expensive-fallback.

The two instances use different representations (parametric / quantile in LAVA's case, sampled empirical in RLM's case) for the same pattern: distributions make downstream risk-aware decisions tractable.

When it's the right shape¶

Consumer's loss is asymmetric or tail-dependent.
Consumer wants to gate expensive actions on confidence.
Predictor can be calibrated cheaply.
Representation cost (a few quantiles vs one number) is negligible relative to the information value.

When it's the wrong shape¶

Consumer just needs a point and has no use for uncertainty.
Calibration is infeasible for the chosen predictor class.
Representation cost (e.g. N-sample LLM decoding per request) blows the inference budget.
The target distribution is so tight (near-deterministic) that a point estimate is effectively lossless.

Adjacent patterns¶

patterns/cheap-approximator-with-expensive-fallback — downstream consumer of distributions; uses distribution width (uncertainty) as the fallback gate.
patterns/lifetime-aware-rescheduling — another downstream consumer; uses the tail of the distribution to decide whether to migrate.

Seen in¶

sources/2025-10-17-google-solving-virtual-machine-puzzles-lava — canonical wiki instance at the VM-scheduling layer; LAVA's learned lifetime distributions drive both initial allocation and LARS-triggered rescheduling.
sources/2025-07-29-google-simulating-large-systems-with-regression-language-models — RLM sampled-decoder recovers distribution over MIPS-per-GCU; calibrated width gates fallback to the slow bin-packer.