PATTERN Cited by 1 source

Prompt-cache-aware static/dynamic ordering¶

Pattern¶

For batch LLM pipelines where a large static context (system prompt, examples, reference material) precedes a smaller dynamic payload per request, order the prompt static-first, dynamic-last so the provider's prompt cache hits on the entire static prefix for the full batch. Prefill cost is paid once on the first request; subsequent requests in the batch skip it.

Forces¶

Per-request prefill dominates prompt latency for large prompts. A cached prefix can skip prefill entirely and start generating from the cached KV state — latency wins can be dramatic on 40K-token prefixes.
Prompt caches key on byte-exact prefix matches. A single mutation anywhere in the prefix invalidates the cache from that point forward.
Most providers cache for minutes, not hours. Batch the work in time to keep the cache warm.
Bulk code-migration workloads fit this shape naturally — the transformation rules are static, the files being transformed are dynamic.

Mechanism¶

Partition the prompt into two contiguous regions:
Static prefix: system prompt, role, task description, reference material, examples. Byte-stable across every call in a batch.
Dynamic suffix: per-request payload — in Zalando's case, <file>{file_content}</file>.
Emit the static prefix first, the dynamic suffix second.
Batch calls that share a prefix in time. Cache lifetimes are minutes; spreading a batch across hours recomputes the prefill for each cache miss.
Pin every token in the static region. No timestamps, request IDs, filename preambles, or any other per-request value.
Version the prefix out of band (build-version string, git SHA in a comment). When the prompt needs to change, the version bump is a one-time cache miss; everyone after that hits the new cache.

Canonical Zalando shape¶

From sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries:

// Static (cacheable across every file in the group)
## Transformation prompt (static)
{transformation_context}
{For each component in group}
  • {interface_details}
  • {mapping_instruction}
  • {examples}

// Dynamic (per file)
## Content to be transformed
<file>
 {file_content}
</file>

For a component group with 30 files to transform, the static prefix is prefill-cached once on the first call and cache-hit on the remaining 29. "Ensuring caching can be leveraged while transforming different files."

Grouping-as-cache-warming¶

Zalando's grouped component batched migration sub-pattern naturally keeps the cache warm: all files in one component group share one cacheable prefix (the group's interface + mapping + examples), so processing them contiguously stays in-cache. Cross-group context switches bust the cache; if cross- group work is necessary, it happens at the end.

Contrast with sibling pattern¶

patterns/dynamic-knowledge-injection-prompt (Vercel v0) achieves the same cache-hit discipline by partitioning the dynamic space into coarse intent classes — each class has a stable injection. This pattern achieves it by extracting a genuinely-static section (nothing dynamic in the prefix) and putting it first. Same goal, different shape: one for agents where every request has some per-request dynamic content, one for batch pipelines where the only dynamic content is the payload itself.

Consequences¶

Positive:

Cost savings scale with batch size. For a 30-file group with a 45K prompt, the per-file cost drops by a factor of (N-1)/N roughly, modulo output tokens.
Latency reduction. Prefill on 45K tokens takes seconds; cached hits take tens of milliseconds.
Cache-friendly development. Once the shape is set, prompt tweaks during development are isolated to specific sections — versioning the prefix lets you invalidate one slot without nuking others.

Negative:

Cache is provider-managed, not client-visible. Zalando can't inspect cache-hit rate directly; they rely on per-request billing to infer it.
Cache lifetimes are short. Minutes, not hours. Long-running batches may miss across pauses.
Prompt-development cost bust. Every prompt change during development is a miss; temperature=0 + regression tests keep development disciplined.
Byte-level fragility. Template interpolation has to produce byte-exact output every time — a trailing newline that appears sometimes is a cache bust.

Seen in¶

sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries — canonical wiki instance at the bulk-code-migration altitude.
(Sibling: sources/2026-01-08-vercel-how-we-made-v0-an-effective-coding-agent — agent-altitude sibling via class-partitioned dynamic injection, see patterns/dynamic-knowledge-injection-prompt.)

concepts/static-dynamic-prompt-partitioning — the concept this pattern anchors
concepts/prompt-cache-consistency — the parent design constraint
patterns/llm-only-code-migration-pipeline — the migration pattern Zalando wraps this inside
patterns/dynamic-knowledge-injection-prompt — the agent-altitude sibling approach
systems/zalando-component-migration-toolkit — the production tool
systems/openai-api — the provider whose cache this targets
companies/zalando