PATTERN Cited by 1 source
Prompt-cache-aware static/dynamic ordering¶
Pattern¶
For batch LLM pipelines where a large static context (system prompt, examples, reference material) precedes a smaller dynamic payload per request, order the prompt static-first, dynamic-last so the provider's prompt cache hits on the entire static prefix for the full batch. Prefill cost is paid once on the first request; subsequent requests in the batch skip it.
Forces¶
- Per-request prefill dominates prompt latency for large prompts. A cached prefix can skip prefill entirely and start generating from the cached KV state — latency wins can be dramatic on 40K-token prefixes.
- Prompt caches key on byte-exact prefix matches. A single mutation anywhere in the prefix invalidates the cache from that point forward.
- Most providers cache for minutes, not hours. Batch the work in time to keep the cache warm.
- Bulk code-migration workloads fit this shape naturally — the transformation rules are static, the files being transformed are dynamic.
Mechanism¶
- Partition the prompt into two contiguous regions:
- Static prefix: system prompt, role, task description, reference material, examples. Byte-stable across every call in a batch.
- Dynamic suffix: per-request payload — in Zalando's
case,
<file>{file_content}</file>. - Emit the static prefix first, the dynamic suffix second.
- Batch calls that share a prefix in time. Cache lifetimes are minutes; spreading a batch across hours recomputes the prefill for each cache miss.
- Pin every token in the static region. No timestamps, request IDs, filename preambles, or any other per-request value.
- Version the prefix out of band (build-version string, git SHA in a comment). When the prompt needs to change, the version bump is a one-time cache miss; everyone after that hits the new cache.
Canonical Zalando shape¶
From sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries:
// Static (cacheable across every file in the group)
## Transformation prompt (static)
{transformation_context}
{For each component in group}
• {interface_details}
• {mapping_instruction}
• {examples}
// Dynamic (per file)
## Content to be transformed
<file>
{file_content}
</file>
For a component group with 30 files to transform, the static prefix is prefill-cached once on the first call and cache-hit on the remaining 29. "Ensuring caching can be leveraged while transforming different files."
Grouping-as-cache-warming¶
Zalando's grouped component batched migration sub-pattern naturally keeps the cache warm: all files in one component group share one cacheable prefix (the group's interface + mapping + examples), so processing them contiguously stays in-cache. Cross-group context switches bust the cache; if cross- group work is necessary, it happens at the end.
Contrast with sibling pattern¶
patterns/dynamic-knowledge-injection-prompt (Vercel v0) achieves the same cache-hit discipline by partitioning the dynamic space into coarse intent classes — each class has a stable injection. This pattern achieves it by extracting a genuinely-static section (nothing dynamic in the prefix) and putting it first. Same goal, different shape: one for agents where every request has some per-request dynamic content, one for batch pipelines where the only dynamic content is the payload itself.
Consequences¶
Positive:
- Cost savings scale with batch size. For a 30-file group with a 45K prompt, the per-file cost drops by a factor of (N-1)/N roughly, modulo output tokens.
- Latency reduction. Prefill on 45K tokens takes seconds; cached hits take tens of milliseconds.
- Cache-friendly development. Once the shape is set, prompt tweaks during development are isolated to specific sections — versioning the prefix lets you invalidate one slot without nuking others.
Negative:
- Cache is provider-managed, not client-visible. Zalando can't inspect cache-hit rate directly; they rely on per-request billing to infer it.
- Cache lifetimes are short. Minutes, not hours. Long-running batches may miss across pauses.
- Prompt-development cost bust. Every prompt change during development is a miss; temperature=0 + regression tests keep development disciplined.
- Byte-level fragility. Template interpolation has to produce byte-exact output every time — a trailing newline that appears sometimes is a cache bust.
Seen in¶
- sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries — canonical wiki instance at the bulk-code-migration altitude.
- (Sibling: sources/2026-01-08-vercel-how-we-made-v0-an-effective-coding-agent — agent-altitude sibling via class-partitioned dynamic injection, see patterns/dynamic-knowledge-injection-prompt.)
Related¶
- concepts/static-dynamic-prompt-partitioning — the concept this pattern anchors
- concepts/prompt-cache-consistency — the parent design constraint
- patterns/llm-only-code-migration-pipeline — the migration pattern Zalando wraps this inside
- patterns/dynamic-knowledge-injection-prompt — the agent-altitude sibling approach
- systems/zalando-component-migration-toolkit — the production tool
- systems/openai-api — the provider whose cache this targets
- companies/zalando