Skip to content

CONCEPT Cited by 1 source

Uncertainty quantification

Uncertainty quantification (UQ) is the discipline of producing a confidence estimate alongside a prediction — not just what the model thinks but how sure it is. In regression, the output is typically a distribution P(y | x) (or summaries like variance, prediction intervals, quantiles) rather than a single point estimate.

UQ matters operationally because most automated decisions that depend on ML predictions need to know when to trust the model. Without calibrated uncertainty, downstream systems either trust it always (unsafe on out-of-distribution inputs) or never (defeats the purpose of ML).

The two kinds of uncertainty

The standard decomposition, which Google names explicitly in the 2025-07-29 RLM post:

  • Aleatoric uncertainty — inherent randomness in the system being modelled. Example: stochastic load demand on a cluster. Collecting more data doesn't reduce it; it's irreducible.
  • Epistemic uncertainty — uncertainty from limited observation or features. The model has seen too few examples of this region of input space, or the features don't carry enough signal. Collecting more (relevant) data does reduce it.

Practical systems care about both but in different ways: aleatoric sets a noise floor for downstream planning; epistemic tells you where to gather more data or retrain.

How RLMs deliver UQ structurally

The 2025-07-29 post's core UQ claim is that text-to-text regression gives UQ for free:

  1. Sample multiple decodes from the same prompt.
  2. Parse each decoded string as a number.
  3. The empirical distribution over parsed numbers approximates P(y | x)both the central tendency (point prediction = mean / mode) and the spread (uncertainty = width).

Google reports:

Why calibration is load-bearing

Without calibration:

  • Always-trust → high-uncertainty predictions are acted on as if they were confident; errors propagate.
  • Always-fallback → the "fast path" is never taken; the ML approximator delivers no cost savings.
  • Threshold-based triggering → whatever uncertainty threshold you pick is arbitrary; false-positive and false-negative fallback rates can't be tuned meaningfully.

Calibration — Pr(actual |y - ŷ| < δ) ≈ 1 - α for predicted (1-α)-interval width δ — is what makes the fallback threshold a well-defined knob rather than a guess.

Common UQ techniques (outside LMs)

  • Bayesian models / MCMC — full posterior over parameters; rigorous but expensive.
  • Bayesian neural networks / variational inference — approximate posterior, cheaper than MCMC.
  • Deep ensembles — train N models, use disagreement as uncertainty. Simple and effective.
  • Monte Carlo dropout — dropout at inference time treated as approximate Bayesian inference.
  • Conformal prediction — distribution-free prediction intervals with coverage guarantees.
  • Quantile regression — model quantiles directly; no distributional assumption.

The RLM's sampling-based approach falls in the "output distribution is native" camp — similar in spirit to BNNs and conformal, but the mechanism is decoding stochasticity rather than parameter uncertainty.

Downstream uses

  • Fast-path / slow-path routing. Trust the cheap ML predictor when confident; fall back to the authoritative solver when not.
  • Active learning. Sample new training data where the model is most uncertain.
  • Risk-aware decision making. Expected utility weighted by predicted-distribution width; discount confident gains against wide-interval risks.
  • Anomaly / drift detection. Spikes in epistemic uncertainty on recent inputs flag distribution shift.

Seen in

Last updated · 200 distilled / 1,178 read