Skip to content

PATTERN Cited by 1 source

Dynamic prompt composition via semantic retrieval

Intent

Instead of shipping one large monolithic system prompt that carries every instruction, example, and edge case the system has ever needed, assemble the system prompt per request by retrieving only the few-shot examples and instruction blocks relevant to the detected question type and content sources via semantic search over a curated library.

The pattern trades a fixed, growing prompt for a small, targeted prompt composed on the fly. Token cost stops accumulating as the system's edge-case coverage grows.

When to apply

  • The monolithic system prompt has grown unmanageably large through accretion of guidance for new question types, new edge cases, new tone rules.
  • Different question types need fundamentally different examples. A menu-focused question does not benefit from a 20-example ambiance guide — but shipping both in every prompt costs real tokens.
  • You can classify the incoming question upstream (see patterns/parallel-pre-retrieval-classifier-pipeline) so the downstream prompt assembler has a signal to retrieve on.
  • Your few-shot library is dense enough that semantic retrieval over it produces better matches than random sampling or fixed rotation.

Mechanism

Library of example fragments

Maintain a corpus of few-shot examples, instruction snippets, and edge-case rules. Each fragment is tagged with:

  • Question type it applies to (ambiance, price, vegan options, opening hours, recommendations).
  • Source type it illustrates (menu-based, review-based, photo-based).
  • Output pattern it demonstrates (citation-linked, bullet-list, yes/no + explanation).

Embed the library offline. This is a one-time + periodic cost.

Per-request assembly

On each question:

  1. Use the upstream classifier signals (question type, chosen content sources, generated keywords) as the retrieval query.
  2. Semantic-search the library for the top-K relevant fragments.
  3. Concatenate the fragments into a system prompt tailored for this request.
  4. Ship only the retrieved fragments + the universal core prompt to the LLM.

Universal core vs. dynamic tail

The pattern splits the system prompt into two parts:

  • Core — style, tone, citation format, refusal policy. Always included, rarely changes.
  • Dynamic tail — question-type-specific examples and edge-case instructions. Retrieved per request.

Canonical wiki instance — Yelp BAA (2026-03-27)

Source: sources/2026-03-27-yelp-building-biz-ask-anything-from-prototype-to-product

Yelp explicitly named this pattern as their solution to the system-prompt-bloat problem that partially reversed their cost-optimisation work:

"As we were iterating we significantly increased system prompt size which 'overwrote' some of our savings from the smarter data selection, so we went with this approach to solve it: Dynamic prompt composition. Instead of one massive system prompt, we're building a prompt assembly system that includes only the instructions, examples, and constraints relevant to the detected question type and content sources. For example, the menu-focused question doesn't need our full 20-example guide on handling ambiance queries. This information is extracted by semantically searching across our few shot examples to construct system prompts only with the examples that are relevant."

Signals used for retrieval: the classifier outputs from the question-analysis stage — detected question type + chosen content sources. The inputs are free because they already exist from the parallel-classifier pipeline (see patterns/parallel-pre-retrieval-classifier-pipeline).

Operational framing from the post:

"Iterating on quality (tone, succinctness, edge cases) led to a massive, unmanageable prompt that we are moving into a dynamic prompt composition."

Why it works

  • Coverage without token cost. The library can grow to hundreds of examples without any individual request paying for the full library.
  • Retrieval is cheap. Embedding lookup + k-NN is orders of magnitude cheaper than the LLM call it's feeding into.
  • Incremental improvement path. Adding a new few-shot example means writing the example + tagging it; no prompt rewrite.
  • Debuggability. When an answer goes wrong, the assembled prompt logs which fragments were retrieved — straightforward to identify whether a missing example or a wrong example was at fault.

Failure modes

  • Retrieval returns the wrong fragments for an edge-case question. Symptom: answer quality regresses despite no prompt change. Mitigation: log retrieved fragment IDs per request; diff bad cases against good cases.
  • Over-reliance on a single fragment. If one canonical example is always retrieved, the library has low diversity and the dynamic composition isn't buying anything over a fixed prompt.
  • Core prompt drift. The universal core can still bloat if owners keep adding to it instead of the dynamic library. Mitigation: budget for the core prompt, push anything question-type-specific to the library.
  • Cold-start for new question types — the library doesn't yet have good examples for a new question class. Mitigation: scheduled review of the inquiry-type distribution surfaces new classes that need seed examples.

Relation to sibling patterns

Seen in

Last updated · 550 distilled / 1,221 read