CONCEPT Cited by 1 source

In-Context Reinforcement Learning¶

Definition¶

In-context reinforcement learning is a learning mechanism in which an agent's capability compounds over time through writes to a persistent retrieval store consulted at inference time — not through weight updates. Successful strategies are distilled from past sessions into reusable skills (compact optimization patterns, debugging heuristics, kernel-tuning recipes) and written back into the retrieval-augmented knowledge base; future sessions retrieve those skills alongside static hardware documentation, arriving at solutions faster and with fewer search steps than the originating session had.

This is distinct from traditional (weights-updating) RL — the model underneath does not change; the context it sees when solving the next problem does. Meta's KernelEvolve uses both mechanisms: in-context RL for session-to-session improvement within its knowledge base, and agentic RL from production signal (weight updates on smaller specialized models) as a complementary data flywheel (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure).

Canonical statement (Meta KernelEvolve 2026-04-02)¶

"This knowledge base is not static. As the system solves new optimization problems it distills successful strategies into reusable skills — compact optimization patterns and debugging heuristics — that are continuously written back into the knowledge base. This self-evolving skill library acts as a form of in-context reinforcement learning: Each successful exploration enriches the context available to future sessions, enabling the system to solve similar problems faster and with fewer search steps, without requiring model retraining."

Mechanism¶

Three primitives compose:

Retrieval-augmented knowledge base with structured categories (correctness constraints / platform-agnostic optimization guidance / hardware-specific documentation / self-evolving skill library).
Post-session distillation — after a successful optimization session, the trajectory (which node-expansions produced the winning kernel, what transformations worked, what profiling signals preceded the breakthrough) is compressed into reusable skill entries.
Dynamic retrieval — the skill library is queried alongside static docs when the synthesizer generates the next round of candidates for a new problem.

The crucial property: no model retraining is required. Every skill added to the library compounds the agent's capability on subsequent sessions; the compound interest mechanism is the retrieval layer, not the base model.

Why this shape matters¶

At hyperscale Meta does not want its kernel-authoring system's capability to be pinned to whatever model was trained last quarter. An in-context RL loop means:

Deployment-time updates. A new skill takes effect as soon as it is written to the knowledge base; no training pipeline, no model deployment, no cold-start of a new checkpoint.
Model-agnosticism. The knowledge base works with whichever LLM is behind the synthesizer. When Meta swaps to a newer base model, the accumulated skill library carries forward.
Auditability + rollback. Skills are explicit markdown-ish entries in a store. Bad skills can be removed; good skills can be explicitly promoted. Weight updates have neither property.

Relationship to patterns/rag-over-hardware-documentation¶

The static side of the retrieval substrate — hardware docs — is the RAG-over-hardware-documentation pattern. The dynamic side — session-distilled skills — is what makes the substrate learn. The combination is what Meta calls the self-evolving skill library.

Relationship to patterns/agentic-rl-from-production-signal¶

In-context RL is the zeroth-order learning loop in KernelEvolve — no gradients, just writes to a store. Agentic RL is the first-order learning loop — actual gradient updates on a specialized smaller model, trained on the trajectories the in-context loop generated. Both loops feed the same data source and reinforce each other: better in-context skills → better trajectories → better training data for the agentic-RL models → better base models generating trajectories → better skills distilled.

Seen in¶

Meta KernelEvolve (2026-04-02, canonical). First wiki instance of the "in-context RL" framing applied to a production agentic system. (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure)

concepts/agentic-kernel-synthesis — the canonical agentic-system shape this learning mechanism sits inside.
systems/kernelevolve — the production instance.
patterns/rag-over-hardware-documentation — the static substrate of the retrieval-augmented KB.
patterns/agentic-rl-from-production-signal — the weight-updating complement.
companies/meta — source.