PATTERN Cited by 1 source

Prompt layer ordering for cache hits¶

Intent¶

Assemble every LLM prompt from most stable to most volatile layers so that the longest possible byte-identical prefix stays unchanged between consecutive calls, maximising prefix-cache hit rates.

Layer ordering¶

Static system prompt — identical across every run.
Stable session context — organisation, user, timezone, skill instructions; stable for the duration of a session.
Conversation history — grows over time, but earlier turns are immutable once recorded.
Turn-dependent context — current iteration's tool results and reasoning state.

Because each layer changes less frequently than the one after it, the shared prefix between iteration N and iteration N+1 is maximised.

Provider integration¶

OpenAI / Gemini: Implicit prefix cache — the provider automatically detects and reuses matching prefixes; no client-side action needed beyond correct ordering.
Anthropic: Explicit cache_control markers placed at the system, stable-context, and last-history boundaries to opt those prefixes into the cache.

Economics¶

On most iterations in a long-running loop, only the freshest tokens (a tool result + the model's next reasoning step) need processing from scratch. The longer a task runs, the more caching pays off. In a 150-iteration loop, the compounding savings are substantial.

Seen in¶

sources/2026-06-18-atlassian-long-horizon-reasoning-engine — Atlassian Long Horizon assembles prompts in this four-layer order and places explicit cache_control markers for Anthropic.