Skip to content

CONCEPT Cited by 1 source

Frontier-model batch-training boundary

Definition

The frontier-model batch-training boundary is the structural property of the current generation of large language models (2017-2026) that training is exclusively offline batch while serving can expose real-time data at inference time only. Every frontier LLM at the end of 2025 — GPT-4/5/5.1, Google Gemini (all versions), xAI Grok, Anthropic Claude — is batch pre-trained on a fixed corpus, then fine-tuned offline (e.g. via RLHF), then deployed. The model is a snapshot.

The verbatim claim from Peter Corless (Redpanda, 2026-01-13):

"regardless of their dense or MoE architectures, they're still all batch trained." (Source: sources/2026-01-13-redpanda-the-convergence-of-ai-and-data-streaming-part-1-the-coming-brick-walls)

Real-time data can enter the model only through inference-time mechanisms:

"While they can increasingly access and reason upon data presented in real time, such as scouring social media video and the latest posts and newsfeeds, or accessing a database in a RAG or MCP architecture, this is at inference time. Their extensive pre-training and much of their fine-tuning, such as Reinforced Learning from Human Feedback (RLHF), is still inherently offline, batch-mode oriented."

This is the architectural premise of the Redpanda "convergence of AI and data streaming" series: to push past the scaling brick walls (public data exhaustion, training cost growth, capability plateaus), the industry will need to find ways to use real-time streaming data in training / retraining / fine-tuning, not just at inference time.

Relationship to nearby wiki concepts

  • concepts/training-serving-boundary — the wiki's prior canonical concept for the training/serving organisational and infrastructure split, centered on whether the split is eroding at the frontier. This concept (batch-training boundary) is the temporal half of that split: when data reaches the model — at pre-training (batch only), fine-tuning (batch only), or inference (can be real-time).
  • concepts/inference-vs-training-workload-shape — the infrastructure-shape half: training is batch, inference is transactional. This concept is compatible: training's batch shape is exactly why real-time streaming data hasn't been grafted into it.
  • RAG and MCP — the inference-time mechanisms the post names as how real-time data reaches a batch-trained frontier model today. They do not cross the batch-training boundary; they compose a pre-trained snapshot with externally-fetched context.
  • concepts/rlhf-offline-batch — the named fine-tuning pipeline that sits on the batch side of the boundary today. Corless: "Reinforced Learning from Human Feedback (RLHF) is still inherently offline, batch-mode oriented."

Why the boundary holds today

Not unpacked in the Corless post at mechanism altitude, but the industry-level reasons are:

  • Pre-training cost is concentrated in a multi-month synchronous job across thousands of accelerators (Megatron-style parallelism, collectives, checkpoint-restart) that doesn't admit a streaming ingest loop during the run.
  • RLHF / DPO / GRPO pipelines require labelled preference data processed in labelled-corpus batches, not as a stream.
  • The serving snapshot is immutable at deploy time; live learning from inference-time traffic would be an online shape the current infrastructure doesn't support at production-safety altitude (drift, poisoning, rollback are all open problems).
  • Continued pretraining (concepts/continued-pretraining) is a checkpointed batch restart, not a stream.

Caveats

  • Single-source canonicalisation. Only the Corless 2026-01-13 post asserts this boundary explicitly on the wiki; the empirical claim is uncontroversial industry knowledge but not directly cited from primary sources on this page.
  • Boundary is not a universal law. Online/continual learning exists in other ML settings (recsys, ranking — e.g. Pinterest's ads engagement model trains continuously). The boundary is specific to frontier LLMs at 10^12+ parameter scale.
  • Inference-time real-time data ≠ real-time training. RAG and MCP give a batch-trained model access to fresh data within the context window, but the model's weights are unchanged. Corless's thesis is that grafting real-time data into weight updates is the trajectory, not the current state.
  • Not every post-RLHF tuning is batch-only. Adapter-level updates (LoRA etc.) can in principle be applied at any cadence, but the industry production shape is still batch- per-refresh.
  • The Redpanda post is a vendor-framing argument for streaming infrastructure — Corless's series promises to argue that streaming (Redpanda's product category) is the unlock. The boundary itself is real; the streaming-unlock argument is forward-looking commentary.

Seen in

Last updated · 470 distilled / 1,213 read