PATTERN Cited by 1 source
Composite model pipeline¶
Pattern¶
Wrap a core LLM in a structured pipeline of pre- and post-processing stages where each stage targets a specific failure mode with a specific latency budget. The pipeline converts a reliability-deficient model running in isolation into a production-grade agentic product without requiring the core model to improve.
Canonical Vercel framing¶
"Combining the dynamic system prompt, LLM Suspense, and autofixers gives us a pipeline that produces stable, functioning generations at higher rates than a standalone model. Each part of the pipeline addresses a specific failure mode, and together they significantly increase the likelihood that users see a rendered website in v0 on the first attempt."
(Source: sources/2026-01-08-vercel-how-we-made-v0-an-effective-coding-agent)
The thesis¶
Reliability is a pipeline problem, not a single-model problem. v0's baseline is a "~10 % LLM-alone error rate" (concepts/llm-code-generation-error-rate) — a production-blocking number. No amount of prompt tuning or model upgrade alone closes it reliably; what does close it is a pipeline of targeted fixers, each with a bounded latency cost, each addressing a class of failure the core model can't solve by itself.
The v0 pipeline (canonical instance)¶
User prompt
↓
[1] Dynamic system prompt
- Intent detect (embedding + keyword)
- Inject version-pinned library knowledge
- Point at curated read-only example fs
- Cache-stable within intent class
↓
[2] Core LLM (streaming)
↓
[3] LLM Suspense (streaming rewrite)
- URL shortening (pre- and post-)
- Import find-and-replace
- Embedding-based icon resolution
- <100 ms per substitution, no model calls
↓
[4] Post-stream autofixers
- AST-based invariant checks (e.g. QueryClientProvider wrap)
- Deterministic package.json completion
- Small fine-tuned placement model
- <250 ms, conditional
↓
Working website preview
Design principles¶
-
One failure mode per stage. Each stage targets a specific class of failure; don't build a monolithic "fix everything" stage.
-
Latency budget per stage. Suspense has a <100 ms budget per substitution; autofixers have <250 ms; both are conditional (only run when needed). The product median latency is unchanged by the pipeline in the happy path.
-
Deterministic over probabilistic where possible. Suspense's icon-resolution and URL-substitution are purely deterministic (embedding lookup + analysis of library exports). The fine-tuned autofixer model is used only where judgment is required (placement of a
QueryClientProvider). -
Catch failures before they're visible. Suspense runs during streaming so the user never sees the intermediate broken state. Post-stream autofixers run before the preview renders. This preserves the UX invariant that the user sees the working artifact, not its repair history.
-
Each stage is latency-gated by a cheap check. No autofixer runs unconditionally — parse the AST, decide if the invariant is violated, then invoke the fix. Most requests skip most stages.
Why this is a pattern, not just an instance¶
The pipeline shape generalises across agentic-code products:
- Stage 1 (knowledge injection) — every product with a dynamic library target has the concepts/training-cutoff-dynamism-gap and can benefit from patterns/dynamic-knowledge-injection-prompt.
- Stage 2 (core LLM) — the model is interchangeable; the pipeline is model-agnostic.
- Stage 3 (streaming rewrite) — patterns/streaming-output-rewrite applies to any generation where the output format has known normalisation rules or name-resolution needs.
- Stage 4 (conditional post-stream autofixers) — patterns/deterministic-plus-model-autofixer applies to any generation where AST-level invariants are verifiable.
Contrast with single-model / monolithic agent¶
The most common industry alternative is "bigger model, longer context, better prompt" — a pure single-model improvement path. The composite-pipeline critique is that this path doesn't address the specific structural failure modes (training-cutoff drift, icon churn, cross-file invariants) that the pipeline stages each solve independently. The bigger model may reduce each class's frequency but won't eliminate them; the pipeline stages individually drive each class's contribution toward zero.
Relation to other orchestration patterns¶
- patterns/specialized-agent-decomposition — related but distinct: that pattern decomposes the agent's task across specialised sub-agents; this pattern decomposes the error-correction pipeline across deterministic + small-model stages.
- patterns/coordinator-sub-reviewer-orchestration — multi-agent coordination for a different stage (review); v0's composite pipeline is single-agent + deterministic wrappers.
Seen in¶
- sources/2026-01-08-vercel-how-we-made-v0-an-effective-coding-agent — canonical disclosure; three-stage pipeline (dynamic prompt + LLM Suspense + autofixers); ~10 % baseline error rate → double-digit improvement in success rate.
Related¶
- systems/vercel-v0 — canonical instance.
- concepts/llm-code-generation-error-rate — baseline metric the pipeline is built against.
- patterns/dynamic-knowledge-injection-prompt — stage 1.
- patterns/streaming-output-rewrite — stage 3.
- patterns/deterministic-plus-model-autofixer — stage 4.
- patterns/specialized-agent-decomposition — adjacent orchestration pattern at a different altitude.
- concepts/llm-hallucination — the underlying failure category the pipeline is designed to contain.