Skip to content

CONCEPT Cited by 1 source

Text-to-text regression

Text-to-text regression is numeric prediction done by a language model that reads the inputs (x) as a string and emits the target (y) as a decoded string, trained with ordinary next-token prediction under cross-entropy loss — the same objective used to train general chat LLMs. Introduced at scale by Google's OmniPred (arXiv:2402.14547) and specialised to large-system prediction by the 2025-07-29 RLM work (arXiv:2506.21718).

What it replaces

Traditional regression assumes tabular inputs — fixed-length numeric vectors with a consistent schema. For complex, unstructured inputs (configuration files, system logs, job specs, hardware descriptors), tabularisation is the bottleneck:

Text-to-text regression sidesteps the whole chain. YAML, JSON, logs, freeform prose — anything a tokenizer can handle — flows in directly. Numbers go through the tokenizer as decimal strings on both sides, avoiding normalisation entirely.

Mechanism

  1. Serialise (x) to a string. YAML or JSON are the canonical choices — structured, lossless, easy to prioritise by field.
  2. Tokenise with the LM's standard tokenizer. Numbers become sequences of digit tokens.
  3. Train with cross-entropy next-token prediction: prompt is the (x) string, target is the (y) string representation of the number.
  4. Infer by greedy decode (point prediction) or by sampling N decodes (empirical density, uncertainty estimate).

What you get for free

  • Feature engineering removed. The tokenizer is the schema.
  • Normalisation removed. Numbers are digit-strings, so 1 and 1e9 are both valid targets without rescaling.
  • Point + distribution + uncertainty from one model. Multi-sample decoding reconstructs uncertainty without a separate model head.
  • Few-shot cross-task adaptation. Because the input is just a string, adapting to a new task = fine-tuning on a few (x_new, y_new) pairs of the same shape.
  • Adapts to new input types without pipeline changes. A new config field adds tokens; no code change.

What it costs

  • Context window bound. Inputs that exceed the window must be truncated, compressed, or prioritised — motivating patterns/token-limit-aware-feature-prioritization.
  • Tokenizer-driven numerical precision. Precision is bounded by how many digit tokens the target emits; for specific digit counts / base representations the tokenizer matters. Google does not report an effective precision floor in the 2025-07-29 post.
  • Inference cost > MLP regressor. Even a small 60M-param encoder-decoder is more expensive per prediction than a gradient-boosted tree on a tabular feature vector. The RLM pays this cost to buy the input-flexibility and uncertainty properties.
  • Evaluation is harder to make comparable. Standard tabular-regression metrics (MAE, RMSE, MAPE) still apply, but calibration of the sampled distribution is a new axis.

Canonical wiki instance

The 2025-07-29 Google post uses text-to-text regression to predict MIPS per GCU on Borg: a 60M-param encoder-decoder RLM reads YAML/JSON cluster state (up to 1M candidate feature tokens, truncated to the 8k window after importance-ordering) and emits the bin-packer's numeric output as a decoded string. Reported quality is "near-perfect" Spearman rank correlation across diverse Borg tasks; calibration is established by correlating predicted-distribution width with residual squared error.

Generalisations (as framed by Google)

  • Industrial-process efficiency prediction (configurations + parameters + context → efficiency metric).
  • Scientific-experiment outcome prediction (experimental setup → measured result).
  • Reward models for RL-trained LLMs that process raw operational data rather than human preference labels only — the post's forward-looking framing of operationalising "experience".

Seen in

Last updated · 200 distilled / 1,178 read