PATTERN Cited by 1 source
Prompt layer ordering for cache hits¶
Intent¶
Assemble every LLM prompt from most stable to most volatile layers so that the longest possible byte-identical prefix stays unchanged between consecutive calls, maximising prefix-cache hit rates.
Layer ordering¶
- Static system prompt — identical across every run.
- Stable session context — organisation, user, timezone, skill instructions; stable for the duration of a session.
- Conversation history — grows over time, but earlier turns are immutable once recorded.
- Turn-dependent context — current iteration's tool results and reasoning state.
Because each layer changes less frequently than the one after it, the shared prefix between iteration N and iteration N+1 is maximised.
Provider integration¶
- OpenAI / Gemini: Implicit prefix cache — the provider automatically detects and reuses matching prefixes; no client-side action needed beyond correct ordering.
- Anthropic: Explicit
cache_controlmarkers placed at the system, stable-context, and last-history boundaries to opt those prefixes into the cache.
Economics¶
On most iterations in a long-running loop, only the freshest tokens (a tool result + the model's next reasoning step) need processing from scratch. The longer a task runs, the more caching pays off. In a 150-iteration loop, the compounding savings are substantial.
Seen in¶
- sources/2026-06-18-atlassian-long-horizon-reasoning-engine —
Atlassian Long Horizon assembles prompts in this four-layer order
and places explicit
cache_controlmarkers for Anthropic.