CONCEPT Cited by 1 source
Prompt prefix caching¶
Definition¶
Prompt prefix caching is a cost and latency optimization where LLM providers reuse the KV-cache from a previous inference call if the current call's prompt shares an identical byte prefix with the previous call. Only the new tokens (after the shared prefix) need full processing.
This is not a context-management technique — the model sees the same context with or without caching. It's a cost and latency win that compounds with iteration count: the longer a task runs, the more the cache pays off. (Source: sources/2026-06-18-atlassian-long-horizon-reasoning-engine)
Provider behaviour¶
- OpenAI and Gemini: Implicit prefix caching — the provider automatically detects and reuses byte-identical prefixes.
- Anthropic: Explicit opt-in — the caller places
cache_controlmarkers at layer boundaries to signal which prefixes should be cached.
Design implication: layer ordering¶
To maximise cache hit rates, assemble prompts from most stable to most volatile:
- Static system prompt (identical across all runs)
- Stable session context (org, user, timezone, skill instructions — stable for session duration)
- Conversation history (grows, but earlier turns are immutable)
- Turn-dependent context (current iteration's tool results)
This ordering ensures the longest possible byte-identical prefix between consecutive iterations.
Seen in¶
- sources/2026-06-18-atlassian-long-horizon-reasoning-engine —
Atlassian Long Horizon places explicit
cache_controlmarkers at system, stable-context, and last-history boundaries for Anthropic; relies on implicit caching for OpenAI/Gemini. In a 150-iteration loop, most iterations only process fresh tokens (a tool result + the model's next reasoning step) from scratch.