PATTERN Cited by 1 source
Dynamic knowledge-injection prompt¶
Pattern¶
Rather than rely on web-search RAG (vulnerable to the telephone-game failure mode) or a frozen baseline system prompt (vulnerable to the training-cutoff dynamism gap), detect the intent of the incoming request and inject version-pinned targeted knowledge directly into the system prompt — keeping the injection byte-stable within an intent class to preserve prompt- cache hits.
Canonical Vercel framing¶
"Instead of relying on web search, we detect AI-related intent using embeddings and keyword matching. When a message is tagged as AI-related and relevant to the AI SDK, we inject knowledge into the prompt describing the targeted version of the SDK. We keep this injection consistent to maximize prompt-cache hits and keep token usage low."
(Source: sources/2026-01-08-vercel-how-we-made-v0-an-effective-coding-agent)
The mechanism¶
-
Intent detection via embeddings + keyword matching. The incoming user message is embedded and compared against pre-computed intent-class embeddings; keyword heuristics act as cheap priors. Output: one-of-N intent labels (e.g. AI-SDK-intent, frontend-framework-intent, integration-intent, generic-intent).
-
Per-class knowledge pack. Each intent class has a hand-curated, version-pinned knowledge pack describing "the targeted version of the SDK" — API surface, common patterns, deprecated APIs to avoid, etc.
-
System-prompt assembly. The assembled system prompt is
[base prompt]+[intent-class knowledge pack]— the base prompt is byte-stable across all classes; the knowledge pack is byte-stable within a class. This maximises prompt-cache hit rate at the model provider (concepts/prompt-cache-consistency). -
Filesystem pointer. In addition to text knowledge, point the model at a read-only filesystem of curated code samples ([[patterns/read-only-curated-example- filesystem]]), letting the model search for concrete patterns on demand.
Why prefer this over web-search RAG¶
Three compounding arguments from Vercel:
- No telephone game. A small summariser model in the RAG path can "hallucinate, misquote something, or omit important information." Direct injection skips this hop entirely.
- No stale results. Web-search indexes can return outdated blog posts and documentation even when the library has since shipped a new version.
- Prompt-cache stability. Direct injection is byte-deterministic within a class; web-search RAG produces different retrieved snippets per request, busting the cache.
When web-search is still useful¶
Web-search RAG remains appropriate for open-domain, fast-moving, unbounded knowledge (current events, market data, user-generated content). Vercel explicitly notes "v0 uses [web search] too" — the direct-injection preference applies to the specific class of library-API knowledge the model is expected to generate code against, where curation is feasible.
Intent-detection implementation notes¶
- Embeddings + keywords both — not one or the other.
Embeddings capture semantic similarity; keywords catch
the long tail (
"useQuery","react-query","AI SDK") where lexical match is stronger signal than semantic proximity. Ensemble is cheap. - Don't over-partition intent classes. More classes = more cache slots = lower hit rate per slot. Vercel's partitioning is coarse (AI SDK, frontend framework, integrations).
- Intent detection itself is small-model-cheap. Embedding lookup + keyword scan is microseconds; this is not the bottleneck.
Trade-offs¶
- Curation cost. Knowledge packs must be maintained — by the agent team, ideally with the library vendor (Vercel's v0 + AI SDK team co-maintain the pack + read-only example fs).
- Coverage gap. Anything not on the curated list falls back to the model's parametric knowledge — so you want intent classes to map to the set of libraries/APIs that matter most for success rate.
- Intent-detection failure. If the request is mis-classified, the wrong knowledge pack is injected or none at all; acceptable failure mode (model falls back to generic prompt) but needs measurement.
Seen in¶
- sources/2026-01-08-vercel-how-we-made-v0-an-effective-coding-agent — canonical pattern; intent-detection via embeddings + keyword matching; version-pinned AI-SDK knowledge injection; prompt-cache consistency as explicit design driver.
Related¶
- concepts/training-cutoff-dynamism-gap — the failure mode being fixed.
- concepts/prompt-cache-consistency — the design constraint that shapes the pattern.
- concepts/web-search-telephone-game — the alternative this pattern avoids.
- concepts/context-engineering — broader discipline of shaping LLM context.
- patterns/read-only-curated-example-filesystem — complementary pattern (text knowledge + code examples).
- patterns/composite-model-pipeline — this is stage 1 of the v0 composite pipeline.
- systems/vercel-v0 — canonical consumer.
- systems/ai-sdk — canonical target library.