CONCEPT Cited by 2 sources
Retrieval-Augmented Generation (RAG)¶
Definition¶
Retrieval-Augmented Generation (RAG) is the inference-time architectural pattern where an LLM's context is augmented with documents retrieved from an external knowledge base at query time, before the model generates its response. RAG is the canonical mechanism that lets a batch-trained frontier model reason over data not in its training corpus — including fresh / real-time / private data — without retraining.
The Corless 2026-01-13 Redpanda post names RAG alongside MCP as the two named inference-time real-time-data mechanisms:
"they can increasingly access and reason upon data presented in real time, such as scouring social media video and the latest posts and newsfeeds, or accessing a database in a RAG or MCP architecture, this is at inference time." (Source: sources/2026-01-13-redpanda-the-convergence-of-ai-and-data-streaming-part-1-the-coming-brick-walls)
Canonical RAG flow¶
- Index corpus — typically documents chunked + embedded into a vector database. See concepts/embedding-dimension-diminishing-returns for the dimensionality-vs-quality trade-off, and concepts/hybrid-retrieval-bm25-vectors for the common dense+sparse retrieval shape.
- Embed the query, retrieve top-k relevant chunks.
- Inject retrieved chunks into the LLM's prompt context ("retrieved-then-generated").
- Generate the response conditioned on retrieved context.
RAG as the iterative-pipeline axis¶
The 2025-06-24 Redpanda "streaming as backbone" essay canonicalised a concrete streaming-infrastructure benefit of RAG: replayability of long-lived tiered-storage streams lets teams re-run historical data through different embedding models or chunking strategies without re-extracting from source. See concepts/stream-replayability-for-iterative-pipelines (Source: sources/2025-06-24-redpanda-why-streaming-is-the-backbone-for-ai-native-data-platforms).
Caveats¶
- Stub. This page is a minimal canonical anchor; deeper RAG architecture (chunking strategies, reranking, query rewriting, HyDE, self-consistency) is not walked here.
- RAG ≠ training on real-time data. RAG exposes fresh data to a frozen model at inference time. The batch- training boundary is unchanged — the model's weights don't learn from retrieved chunks.
- RAG hallucinations. RAG mitigates but doesn't eliminate hallucination; the model can still confabulate despite having correct retrieved context in its prompt.
Seen in¶
- 2026-01-13 Redpanda — The convergence of AI and data streaming, Part 1 (sources/2026-01-13-redpanda-the-convergence-of-ai-and-data-streaming-part-1-the-coming-brick-walls) — named as one of the two inference-time real-time-data access mechanisms (alongside MCP) that do not cross the batch-training boundary.
- 2025-06-24 Redpanda — Why streaming is the backbone for AI-native data platforms (sources/2025-06-24-redpanda-why-streaming-is-the-backbone-for-ai-native-data-platforms) — stream-replayability as the iterative-RAG-pipeline unlock.
Related¶
- concepts/frontier-model-batch-training-boundary — the structural boundary RAG operates at inference side of.
- concepts/hybrid-retrieval-bm25-vectors — common retrieval shape.
- concepts/rag-as-a-judge — RAG-adjacent evaluation pattern.
- concepts/embedding-dimension-diminishing-returns — the dimensionality trade-off for the embedding step.
- concepts/stream-replayability-for-iterative-pipelines — the streaming-infrastructure unlock for iterative RAG.
- systems/model-context-protocol — the sibling inference-time-integration shape.
- companies/redpanda — the company whose blog canonicalises this framing.