Skip to content

PATTERN Cited by 1 source

Dynamic knowledge-injection prompt

Pattern

Rather than rely on web-search RAG (vulnerable to the telephone-game failure mode) or a frozen baseline system prompt (vulnerable to the training-cutoff dynamism gap), detect the intent of the incoming request and inject version-pinned targeted knowledge directly into the system prompt — keeping the injection byte-stable within an intent class to preserve prompt- cache hits.

Canonical Vercel framing

"Instead of relying on web search, we detect AI-related intent using embeddings and keyword matching. When a message is tagged as AI-related and relevant to the AI SDK, we inject knowledge into the prompt describing the targeted version of the SDK. We keep this injection consistent to maximize prompt-cache hits and keep token usage low."

(Source: sources/2026-01-08-vercel-how-we-made-v0-an-effective-coding-agent)

The mechanism

  1. Intent detection via embeddings + keyword matching. The incoming user message is embedded and compared against pre-computed intent-class embeddings; keyword heuristics act as cheap priors. Output: one-of-N intent labels (e.g. AI-SDK-intent, frontend-framework-intent, integration-intent, generic-intent).

  2. Per-class knowledge pack. Each intent class has a hand-curated, version-pinned knowledge pack describing "the targeted version of the SDK" — API surface, common patterns, deprecated APIs to avoid, etc.

  3. System-prompt assembly. The assembled system prompt is [base prompt] + [intent-class knowledge pack] — the base prompt is byte-stable across all classes; the knowledge pack is byte-stable within a class. This maximises prompt-cache hit rate at the model provider (concepts/prompt-cache-consistency).

  4. Filesystem pointer. In addition to text knowledge, point the model at a read-only filesystem of curated code samples ([[patterns/read-only-curated-example- filesystem]]), letting the model search for concrete patterns on demand.

Why prefer this over web-search RAG

Three compounding arguments from Vercel:

  1. No telephone game. A small summariser model in the RAG path can "hallucinate, misquote something, or omit important information." Direct injection skips this hop entirely.
  2. No stale results. Web-search indexes can return outdated blog posts and documentation even when the library has since shipped a new version.
  3. Prompt-cache stability. Direct injection is byte-deterministic within a class; web-search RAG produces different retrieved snippets per request, busting the cache.

When web-search is still useful

Web-search RAG remains appropriate for open-domain, fast-moving, unbounded knowledge (current events, market data, user-generated content). Vercel explicitly notes "v0 uses [web search] too" — the direct-injection preference applies to the specific class of library-API knowledge the model is expected to generate code against, where curation is feasible.

Intent-detection implementation notes

  • Embeddings + keywords both — not one or the other. Embeddings capture semantic similarity; keywords catch the long tail ("useQuery", "react-query", "AI SDK") where lexical match is stronger signal than semantic proximity. Ensemble is cheap.
  • Don't over-partition intent classes. More classes = more cache slots = lower hit rate per slot. Vercel's partitioning is coarse (AI SDK, frontend framework, integrations).
  • Intent detection itself is small-model-cheap. Embedding lookup + keyword scan is microseconds; this is not the bottleneck.

Trade-offs

  • Curation cost. Knowledge packs must be maintained — by the agent team, ideally with the library vendor (Vercel's v0 + AI SDK team co-maintain the pack + read-only example fs).
  • Coverage gap. Anything not on the curated list falls back to the model's parametric knowledge — so you want intent classes to map to the set of libraries/APIs that matter most for success rate.
  • Intent-detection failure. If the request is mis-classified, the wrong knowledge pack is injected or none at all; acceptable failure mode (model falls back to generic prompt) but needs measurement.

Seen in

Last updated · 476 distilled / 1,218 read