CONCEPT Cited by 1 source
Volatile-only prompt-cache isolation¶
Definition¶
Volatile-only prompt-cache isolation is the multi-tenant security shape for a prompt-caching layer composed of three orthogonal properties:
- Tenant-isolated — cache pools are bounded to a single tenant. One customer's prompt cannot hit another customer's cached entry.
- Volatile-memory only — cache resides in RAM; no disk, no object storage, no shared persistent KV.
- Never persisted — replica restart wipes the cache; it does not survive process boundaries.
Together these three properties form a safety envelope that lets a serving platform ship caching as a default-on feature on multi-tenant infrastructure without dragging in encryption-at- rest, key-management, or persistence-layer threat modelling.
Canonical wiki instance — Databricks FMAPI¶
Databricks FMAPI Prompt Caching (GA 2026-05-22):
"Security is a first-class concern at Databricks. Prompt caches are isolated, only reside in volatile memory and are never persisted." (Source: sources/2026-05-22-databricks-accelerating-llm-inference-with-prompt-caching-for-open-source-models)
The composition is the load-bearing argument the post makes for why caching can be implicit + default-on on a multi-tenant serving substrate: the security envelope holds without requiring customer configuration.
Why all three properties are needed¶
Each property alone is insufficient:
- Tenant-isolated alone: still requires a persistence-layer threat model if cached entries hit disk. An attacker with storage access could read another tenant's cached prompts even if the lookup logic enforces isolation.
- Volatile-memory only alone: protects against persistent storage compromise but doesn't prevent cross-tenant leakage in the live cache pool — same-process residency is not the same as tenant isolation.
- Never persisted alone: weakest property; just means the cache is ephemeral. Says nothing about whether two tenants share the live RAM-resident pool.
Composing all three closes the gap. The "isolated" property covers cross-tenant leakage in the live pool; "volatile memory" forecloses the disk threat model; "never persisted" documents that even within a replica's lifetime there is no persistence crossing process boundaries.
Tradeoffs accepted¶
- Cache must be rebuilt on replica restart. Every deploy, rollout, autoscaling event, or replica failure clears the cache for every tenant on that replica. The platform must accept the warm-up cost on each restart and architect rollouts so that a cold cache doesn't violate latency SLAs.
- No cross-replica or cross-region cache sharing, at least not without a separate persistence-aware design. The 30% hit ratio Databricks discloses is measured on a single replica's pool — cross-replica routing (e.g. session-affinity- driven routing in Cloudflare's case) is a different architectural choice that explicitly persists or shares cache state across replicas.
- Cache hit rate is bounded by per-replica request locality. If a tenant's traffic is sharded across many replicas without affinity, each replica only sees a slice of the prefix-reuse opportunity. Cloudflare addresses this via x-session-affinity; Databricks' approach (per the disclosed material) is to accept the per-replica scope and rely on prefill-skip economics being large per hit.
- No cache-write / cache-read billing differentiation feasible:
with no persistent cache record, there's no fixed-cost-amortised
asset to bill against. The Databricks post does not disclose a
separate billing line for cached input tokens (compare
Cloudflare's "discounted cached tokens" incentive on
x-session-affinity).
Tenant-isolation mechanism (undisclosed)¶
The Databricks post claims tenant isolation but does not disclose the mechanism. Plausible designs include:
- Process-per-tenant — strongest isolation, highest cost.
- Cache-key prefix per tenant — single shared pool, tenant-scoped lookups. Requires correct lookup discipline.
- VM / container boundary — typical for managed serving platforms.
The post is silent on which is used, so this concept page records the property claim (isolated + volatile + never-persisted) as the design surface, not the implementation primitive.
Relationship to other primitives¶
- concepts/kv-cache — the cached object. Volatile-only isolation is a policy about KV-cache reuse, not about the KV-cache mechanism itself.
- concepts/tenant-isolation — generalises the isolation property to any multi-tenant substrate. Volatile-only prompt- cache isolation is the prompt-cache-specialised instance.
- concepts/implicit-prompt-caching — sibling concept. The volatile-only safety envelope is the security argument that makes implicit (default-on, no customer configuration) caching shippable on multi-tenant infrastructure.
- concepts/session-affinity-prompt-caching — Cloudflare's contrasting design: cache is shared across replicas (via client-supplied affinity hints) and the threat model acknowledges a cross-replica cache surface. Databricks chose the opposite tradeoff: smaller per-replica cache scope, simpler threat model, default-on shippability.
Seen in¶
- sources/2026-05-22-databricks-accelerating-llm-inference-with-prompt-caching-for-open-source-models — canonical wiki instance. The composition of tenant isolation + volatile-memory residency + no persistence is named as the security stance for Databricks FMAPI Prompt Caching on multi-tenant infrastructure.