PATTERN Cited by 1 source

Composite model pipeline¶

Pattern¶

Wrap a core LLM in a structured pipeline of pre- and post-processing stages where each stage targets a specific failure mode with a specific latency budget. The pipeline converts a reliability-deficient model running in isolation into a production-grade agentic product without requiring the core model to improve.

Canonical Vercel framing¶

"Combining the dynamic system prompt, LLM Suspense, and autofixers gives us a pipeline that produces stable, functioning generations at higher rates than a standalone model. Each part of the pipeline addresses a specific failure mode, and together they significantly increase the likelihood that users see a rendered website in v0 on the first attempt."

(Source: sources/2026-01-08-vercel-how-we-made-v0-an-effective-coding-agent)

The thesis¶

Reliability is a pipeline problem, not a single-model problem. v0's baseline is a "~10 % LLM-alone error rate" (concepts/llm-code-generation-error-rate) — a production-blocking number. No amount of prompt tuning or model upgrade alone closes it reliably; what does close it is a pipeline of targeted fixers, each with a bounded latency cost, each addressing a class of failure the core model can't solve by itself.

The v0 pipeline (canonical instance)¶

User prompt
  ↓
[1] Dynamic system prompt
    - Intent detect (embedding + keyword)
    - Inject version-pinned library knowledge
    - Point at curated read-only example fs
    - Cache-stable within intent class
  ↓
[2] Core LLM (streaming)
  ↓
[3] LLM Suspense (streaming rewrite)
    - URL shortening (pre- and post-)
    - Import find-and-replace
    - Embedding-based icon resolution
    - <100 ms per substitution, no model calls
  ↓
[4] Post-stream autofixers
    - AST-based invariant checks (e.g. QueryClientProvider wrap)
    - Deterministic package.json completion
    - Small fine-tuned placement model
    - <250 ms, conditional
  ↓
Working website preview

Design principles¶

One failure mode per stage. Each stage targets a specific class of failure; don't build a monolithic "fix everything" stage.
Latency budget per stage. Suspense has a <100 ms budget per substitution; autofixers have <250 ms; both are conditional (only run when needed). The product median latency is unchanged by the pipeline in the happy path.
Deterministic over probabilistic where possible. Suspense's icon-resolution and URL-substitution are purely deterministic (embedding lookup + analysis of library exports). The fine-tuned autofixer model is used only where judgment is required (placement of a QueryClientProvider).
Catch failures before they're visible. Suspense runs during streaming so the user never sees the intermediate broken state. Post-stream autofixers run before the preview renders. This preserves the UX invariant that the user sees the working artifact, not its repair history.
Each stage is latency-gated by a cheap check. No autofixer runs unconditionally — parse the AST, decide if the invariant is violated, then invoke the fix. Most requests skip most stages.

Why this is a pattern, not just an instance¶

The pipeline shape generalises across agentic-code products:

Stage 1 (knowledge injection) — every product with a dynamic library target has the concepts/training-cutoff-dynamism-gap and can benefit from patterns/dynamic-knowledge-injection-prompt.
Stage 2 (core LLM) — the model is interchangeable; the pipeline is model-agnostic.
Stage 3 (streaming rewrite) — patterns/streaming-output-rewrite applies to any generation where the output format has known normalisation rules or name-resolution needs.
Stage 4 (conditional post-stream autofixers) — patterns/deterministic-plus-model-autofixer applies to any generation where AST-level invariants are verifiable.

Contrast with single-model / monolithic agent¶

The most common industry alternative is "bigger model, longer context, better prompt" — a pure single-model improvement path. The composite-pipeline critique is that this path doesn't address the specific structural failure modes (training-cutoff drift, icon churn, cross-file invariants) that the pipeline stages each solve independently. The bigger model may reduce each class's frequency but won't eliminate them; the pipeline stages individually drive each class's contribution toward zero.

Relation to other orchestration patterns¶

patterns/specialized-agent-decomposition — related but distinct: that pattern decomposes the agent's task across specialised sub-agents; this pattern decomposes the error-correction pipeline across deterministic + small-model stages.
patterns/coordinator-sub-reviewer-orchestration — multi-agent coordination for a different stage (review); v0's composite pipeline is single-agent + deterministic wrappers.

Seen in¶

sources/2026-01-08-vercel-how-we-made-v0-an-effective-coding-agent — canonical disclosure; three-stage pipeline (dynamic prompt + LLM Suspense + autofixers); ~10 % baseline error rate → double-digit improvement in success rate.

systems/vercel-v0 — canonical instance.
concepts/llm-code-generation-error-rate — baseline metric the pipeline is built against.
patterns/dynamic-knowledge-injection-prompt — stage 1.
patterns/streaming-output-rewrite — stage 3.
patterns/deterministic-plus-model-autofixer — stage 4.
patterns/specialized-agent-decomposition — adjacent orchestration pattern at a different altitude.
concepts/llm-hallucination — the underlying failure category the pipeline is designed to contain.