CONCEPT Cited by 1 source
Text-to-text regression¶
Text-to-text regression is numeric prediction done by a language model that reads the inputs (x) as a string and emits the target (y) as a decoded string, trained with ordinary next-token prediction under cross-entropy loss — the same objective used to train general chat LLMs. Introduced at scale by Google's OmniPred (arXiv:2402.14547) and specialised to large-system prediction by the 2025-07-29 RLM work (arXiv:2506.21718).
What it replaces¶
Traditional regression assumes tabular inputs — fixed-length numeric vectors with a consistent schema. For complex, unstructured inputs (configuration files, system logs, job specs, hardware descriptors), tabularisation is the bottleneck:
- Feature engineering dominates the project timeline.
- Normalisation and feature scaling add pre-processing complexity.
- New data types (new config field, new hardware class, new workload) force restarting the pipeline (Source: sources/2025-07-29-google-simulating-large-systems-with-regression-language-models).
Text-to-text regression sidesteps the whole chain. YAML, JSON, logs, freeform prose — anything a tokenizer can handle — flows in directly. Numbers go through the tokenizer as decimal strings on both sides, avoiding normalisation entirely.
Mechanism¶
- Serialise (x) to a string. YAML or JSON are the canonical choices — structured, lossless, easy to prioritise by field.
- Tokenise with the LM's standard tokenizer. Numbers become sequences of digit tokens.
- Train with cross-entropy next-token prediction: prompt is the (x) string, target is the (y) string representation of the number.
- Infer by greedy decode (point prediction) or by sampling N decodes (empirical density, uncertainty estimate).
What you get for free¶
- Feature engineering removed. The tokenizer is the schema.
- Normalisation removed. Numbers are digit-strings, so
1and1e9are both valid targets without rescaling. - Point + distribution + uncertainty from one model. Multi-sample decoding reconstructs uncertainty without a separate model head.
- Few-shot cross-task adaptation. Because the input is just
a string, adapting to a new task = fine-tuning on a few
(x_new, y_new)pairs of the same shape. - Adapts to new input types without pipeline changes. A new config field adds tokens; no code change.
What it costs¶
- Context window bound. Inputs that exceed the window must be truncated, compressed, or prioritised — motivating patterns/token-limit-aware-feature-prioritization.
- Tokenizer-driven numerical precision. Precision is bounded by how many digit tokens the target emits; for specific digit counts / base representations the tokenizer matters. Google does not report an effective precision floor in the 2025-07-29 post.
- Inference cost > MLP regressor. Even a small 60M-param encoder-decoder is more expensive per prediction than a gradient-boosted tree on a tabular feature vector. The RLM pays this cost to buy the input-flexibility and uncertainty properties.
- Evaluation is harder to make comparable. Standard tabular-regression metrics (MAE, RMSE, MAPE) still apply, but calibration of the sampled distribution is a new axis.
Canonical wiki instance¶
The 2025-07-29 Google post uses text-to-text regression to predict MIPS per GCU on Borg: a 60M-param encoder-decoder RLM reads YAML/JSON cluster state (up to 1M candidate feature tokens, truncated to the 8k window after importance-ordering) and emits the bin-packer's numeric output as a decoded string. Reported quality is "near-perfect" Spearman rank correlation across diverse Borg tasks; calibration is established by correlating predicted-distribution width with residual squared error.
Generalisations (as framed by Google)¶
- Industrial-process efficiency prediction (configurations + parameters + context → efficiency metric).
- Scientific-experiment outcome prediction (experimental setup → measured result).
- Reward models for RL-trained LLMs that process raw operational data rather than human preference labels only — the post's forward-looking framing of operationalising "experience".
Seen in¶
- sources/2025-07-29-google-simulating-large-systems-with-regression-language-models — large-systems instantiation, Borg production target.