PATTERN Cited by 1 source
RAG over Hardware Documentation¶
Problem¶
An LLM-based code-generation agent must produce correct + performant kernels for accelerator hardware whose documentation is either (a) not in its pretraining corpus (proprietary silicon, e.g. Meta MTIA) or (b) incomplete / out-of-date relative to the latest silicon generation. Static prompt templates don't solve it — the needed context is too large to fit in every prompt, changes per generation, and varies with the runtime failure mode (compile error vs memory bottleneck vs numerical mismatch).
Shape¶
Build a hierarchical retrieval-augmented knowledge base with three categories plus optional dynamic layer:
- Correctness constraints — valid kernel-shape rules, ISA-level constraints, precision/type rules.
- Platform-agnostic optimization guidance — tiling strategies, pipelining, memory-hierarchy principles, debugging patterns that transfer across hardware.
- Hardware-specific documentation — per-platform architecture manuals, ISA references, memory-hierarchy specs, optimization patterns.
- (Optional, dynamic) self-evolving skill library — distilled successful-strategy writes from past sessions (see in-context RL).
Retrieve dynamically based on runtime signals from the evaluation harness:
- Memory-bandwidth bottleneck → retrieve memory-hierarchy documentation.
- Compilation error → retrieve debugging guidance for the target platform's compiler.
- Tensor-core underutilization → retrieve tensor-layout + occupancy docs.
Inject the retrieved content into the synthesizer's context-aware prompt for the next round of candidate generation.
Canonical instance — Meta KernelEvolve (2026-04-02)¶
KernelEvolve makes this the load-bearing mechanism for programming MTIA silicon. "Because these chips are proprietary, no public LLM has been trained on MTIA code. A standard coding assistant lacks the context to write optimized MTIA kernels... KernelEvolve solves this through systematic knowledge injection. We encode MTIA-specific documentation (architecture manuals, instruction set references, memory hierarchy specifications, and optimization patterns) directly into the retrieval-augmented knowledge base. When the system targets MTIA, it retrieves and incorporates this proprietary knowledge into its reasoning, effectively 'learning' the hardware in real time."
Meta calls out the engineering-cost inversion: "When a new chip arrives, the engineering cost shifts from writing thousands of kernels by hand to curating a set of hardware documents and injecting them into the knowledge base."
And the runtime-signal-triggered retrieval: "a memory bandwidth bottleneck triggers retrieval of memory hierarchy documentation; a compilation error activates debugging guidance."
Why it works¶
Three properties align with what LLMs are good at:
- Large-context reasoning — modern frontier LLMs handle 100K+ token contexts; hardware docs fit comfortably alongside a kernel-authoring prompt.
- In-context generalization — LLMs can apply documented patterns to novel instances without retraining. Hardware docs are highly structured (sections, tables, code examples) — ideal retrieval substrate.
- Dynamic specialization — retrieving by runtime signal gives the LLM exactly the sub-document it needs, not the whole manual.
Consequences¶
Positive:
- Proprietary silicon becomes programmable by LLM-based agents without waiting for the silicon's ISA to appear in pretraining data.
- Generation-to-generation portability — when silicon refreshes, update the docs in the retrieval store, not the agent. KernelEvolve runs across four MTIA generations (MTIA 300 → 500) from a single agent architecture.
- Complements tree-search-over-LLM-candidates — the search engine explores, retrieval focuses the synthesizer on relevant docs per search node.
Negative / care required:
- Retrieval quality matters more than at shallower RAG uses — a wrong doc retrieved for a memory-bandwidth issue can send the search off into a wrong region.
- Doc curation becomes engineering work — someone has to structure + maintain the hardware-doc corpus.
- Dynamic retrieval signals need to be reliable — if the evaluation harness can't distinguish memory-bound from compute-bound accurately, the retrieval layer can't help.
Seen in¶
- Meta KernelEvolve (2026-04-02, canonical). First wiki canonicalisation of RAG applied to hardware documentation for LLM-based kernel codegen. (Source: sources/2026-04-02-meta-kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure)
Related¶
- concepts/hardware-proprietary-knowledge-injection — the canonical concept this pattern implements.
- concepts/heterogeneous-ai-accelerator-fleet — the containing forcing function.
- concepts/in-context-reinforcement-learning — the self-evolving dynamic layer of the retrieval substrate.
- systems/kernelevolve — the production instance.
- systems/meta-mtia — the proprietary-silicon target the pattern most clearly addresses.
- patterns/tree-search-over-llm-candidates — the complementary search-structure pattern.
- patterns/evaluation-harness-in-agent-loop — the signal source that triggers dynamic retrieval.
- companies/meta — canonicalising source.