CONCEPT Cited by 1 source
Static/dynamic prompt partitioning¶
Definition¶
Static/dynamic prompt partitioning is the prompt-layout primitive that splits a prompt into two contiguous regions — a static prefix that is byte-identical across a batch of requests, and a dynamic suffix that varies per request — and orders them static-first. This lets the provider's prompt cache hit on the entire prefix for the full batch, amortising prefill over N calls while the dynamic portion still carries per-request content.
The mechanism¶
Provider prompt caches (OpenAI, Anthropic) key on byte- exact prefix matches. A single mutation anywhere in the prefix invalidates the cache from that point forward. Putting stable content first and varying content last maximises the length of the cacheable prefix across a batch.
Canonical Zalando framing¶
From sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries:
"LLM APIs offer the ability to cache identical prompts, potentially reducing API costs and improving response times by reusing previous results (e.g. Prompt caching - OpenAI API). To leverage this capability effectively, we set up a structured prompt format that maximized cache hits. The prompt was organized to have the static part like transformation examples at top and the dynamic part (the file content) and the end, ensuring caching can be leveraged while transforming different files."
Zalando's specific partition, per component group:
- Static prefix (cached across every file in the group):
- System prompt (role + output-format contract)
- Per-component interface details
- Per-component mapping instructions
- Per-component examples
- Dynamic suffix (unique per file):
<file>{file_content}</file>
For 30 files in a component group, the static prefix is prefill-cached on the first file and cache-hit on the other 29.
Design heuristics¶
- Single contiguous static prefix, single contiguous dynamic suffix. Interleaving static and dynamic blocks breaks caching at the first dynamic byte.
- Pin every token in the static portion to a byte-exact template. Don't include a timestamp, request ID, file path, or any other per-request value in the prefix.
- Group workload by cacheable-prefix identity. Batch files whose prompts share a prefix together in time so the cache stays warm. Zalando's logical component grouping indirectly does this — same group = same prefix = same cache hit.
- Version the prefix out of band. When prompts need to change (new examples, mapping fix, library update), increment a version tag so the cache misses once and then rewarms. Zalando does not describe this explicitly but it's implicit in prompt-regression tests being in CI.
- Order of static content by stability, most-stable first. Inside the static prefix, put the most slowly-changing content (role, output format) at the very top; transformations that might be tweaked more often go lower. This way, when a tweak lands, only the tail of the prefix invalidates, not the whole thing.
Distinction from sibling primitives¶
- vs patterns/dynamic-knowledge-injection-prompt: Vercel's v0 keeps the static-within-a-class invariant by partitioning the injection space into intent classes. Zalando keeps the invariant by partitioning the component space into groups. Same mechanism, different dimension of partitioning.
- vs concepts/prompt-cache-consistency: the parent concept is the design constraint ("keep the prefix stable"). This concept is the concrete ordering primitive ("static first, dynamic last, single contiguous prefix") that enacts the constraint.
Failure modes¶
- Per-file preamble in the prefix. A natural but wrong
impulse is to include "Now transforming
{filename}..." at the top of the prompt for clarity. That single filename token busts the cache for the rest of the prefix. - Unstable JSON / whitespace. If the prefix is built via template interpolation, make sure the interpolation produces byte-exact output. A trailing newline that appears sometimes and not others will miss the cache.
- Too-frequent prompt changes during development. In the dev loop, every prompt change busts every cache slot — this is a dev-time cost paid once, not a production issue.
Seen in¶
- sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries — canonical wiki instance at the code-migration altitude.
- (Sibling: sources/2026-01-08-vercel-how-we-made-v0-an-effective-coding-agent at the coding-agent altitude — see concepts/prompt-cache-consistency for that canonical framing.)
Related¶
- concepts/prompt-cache-consistency — the parent design-constraint concept
- concepts/prompt-interface-mapping-examples-composition — the content that sits inside Zalando's static prefix
- patterns/prompt-cache-aware-static-dynamic-ordering — the pattern this concept anchors
- patterns/dynamic-knowledge-injection-prompt — Vercel's sibling at the coding-agent altitude
- systems/zalando-component-migration-toolkit — the production tool
- systems/openai-api — the provider whose cache the partitioning targets
- companies/zalando