CONCEPT Cited by 1 source

Text-to-text regression¶

Text-to-text regression is numeric prediction done by a language model that reads the inputs (x) as a string and emits the target (y) as a decoded string, trained with ordinary next-token prediction under cross-entropy loss — the same objective used to train general chat LLMs. Introduced at scale by Google's OmniPred (arXiv:2402.14547) and specialised to large-system prediction by the 2025-07-29 RLM work (arXiv:2506.21718).

What it replaces¶

Traditional regression assumes tabular inputs — fixed-length numeric vectors with a consistent schema. For complex, unstructured inputs (configuration files, system logs, job specs, hardware descriptors), tabularisation is the bottleneck:

Feature engineering dominates the project timeline.
Normalisation and feature scaling add pre-processing complexity.
New data types (new config field, new hardware class, new workload) force restarting the pipeline (Source: sources/2025-07-29-google-simulating-large-systems-with-regression-language-models).

Text-to-text regression sidesteps the whole chain. YAML, JSON, logs, freeform prose — anything a tokenizer can handle — flows in directly. Numbers go through the tokenizer as decimal strings on both sides, avoiding normalisation entirely.

Mechanism¶

Serialise (x) to a string. YAML or JSON are the canonical choices — structured, lossless, easy to prioritise by field.
Tokenise with the LM's standard tokenizer. Numbers become sequences of digit tokens.
Train with cross-entropy next-token prediction: prompt is the (x) string, target is the (y) string representation of the number.
Infer by greedy decode (point prediction) or by sampling N decodes (empirical density, uncertainty estimate).

What you get for free¶

Feature engineering removed. The tokenizer is the schema.
Normalisation removed. Numbers are digit-strings, so 1 and 1e9 are both valid targets without rescaling.
Point + distribution + uncertainty from one model. Multi-sample decoding reconstructs uncertainty without a separate model head.
Few-shot cross-task adaptation. Because the input is just a string, adapting to a new task = fine-tuning on a few (x_new, y_new) pairs of the same shape.
Adapts to new input types without pipeline changes. A new config field adds tokens; no code change.

What it costs¶

Context window bound. Inputs that exceed the window must be truncated, compressed, or prioritised — motivating patterns/token-limit-aware-feature-prioritization.
Tokenizer-driven numerical precision. Precision is bounded by how many digit tokens the target emits; for specific digit counts / base representations the tokenizer matters. Google does not report an effective precision floor in the 2025-07-29 post.
Inference cost > MLP regressor. Even a small 60M-param encoder-decoder is more expensive per prediction than a gradient-boosted tree on a tabular feature vector. The RLM pays this cost to buy the input-flexibility and uncertainty properties.
Evaluation is harder to make comparable. Standard tabular-regression metrics (MAE, RMSE, MAPE) still apply, but calibration of the sampled distribution is a new axis.

Canonical wiki instance¶

The 2025-07-29 Google post uses text-to-text regression to predict MIPS per GCU on Borg: a 60M-param encoder-decoder RLM reads YAML/JSON cluster state (up to 1M candidate feature tokens, truncated to the 8k window after importance-ordering) and emits the bin-packer's numeric output as a decoded string. Reported quality is "near-perfect" Spearman rank correlation across diverse Borg tasks; calibration is established by correlating predicted-distribution width with residual squared error.

Generalisations (as framed by Google)¶

Industrial-process efficiency prediction (configurations + parameters + context → efficiency metric).
Scientific-experiment outcome prediction (experimental setup → measured result).
Reward models for RL-trained LLMs that process raw operational data rather than human preference labels only — the post's forward-looking framing of operationalising "experience".

Seen in¶

sources/2025-07-29-google-simulating-large-systems-with-regression-language-models — large-systems instantiation, Borg production target.