Skip to content

CONCEPT Cited by 1 source

Volatile-only prompt-cache isolation

Definition

Volatile-only prompt-cache isolation is the multi-tenant security shape for a prompt-caching layer composed of three orthogonal properties:

  1. Tenant-isolated — cache pools are bounded to a single tenant. One customer's prompt cannot hit another customer's cached entry.
  2. Volatile-memory only — cache resides in RAM; no disk, no object storage, no shared persistent KV.
  3. Never persisted — replica restart wipes the cache; it does not survive process boundaries.

Together these three properties form a safety envelope that lets a serving platform ship caching as a default-on feature on multi-tenant infrastructure without dragging in encryption-at- rest, key-management, or persistence-layer threat modelling.

Canonical wiki instance — Databricks FMAPI

Databricks FMAPI Prompt Caching (GA 2026-05-22):

"Security is a first-class concern at Databricks. Prompt caches are isolated, only reside in volatile memory and are never persisted." (Source: sources/2026-05-22-databricks-accelerating-llm-inference-with-prompt-caching-for-open-source-models)

The composition is the load-bearing argument the post makes for why caching can be implicit + default-on on a multi-tenant serving substrate: the security envelope holds without requiring customer configuration.

Why all three properties are needed

Each property alone is insufficient:

  • Tenant-isolated alone: still requires a persistence-layer threat model if cached entries hit disk. An attacker with storage access could read another tenant's cached prompts even if the lookup logic enforces isolation.
  • Volatile-memory only alone: protects against persistent storage compromise but doesn't prevent cross-tenant leakage in the live cache pool — same-process residency is not the same as tenant isolation.
  • Never persisted alone: weakest property; just means the cache is ephemeral. Says nothing about whether two tenants share the live RAM-resident pool.

Composing all three closes the gap. The "isolated" property covers cross-tenant leakage in the live pool; "volatile memory" forecloses the disk threat model; "never persisted" documents that even within a replica's lifetime there is no persistence crossing process boundaries.

Tradeoffs accepted

  • Cache must be rebuilt on replica restart. Every deploy, rollout, autoscaling event, or replica failure clears the cache for every tenant on that replica. The platform must accept the warm-up cost on each restart and architect rollouts so that a cold cache doesn't violate latency SLAs.
  • No cross-replica or cross-region cache sharing, at least not without a separate persistence-aware design. The 30% hit ratio Databricks discloses is measured on a single replica's pool — cross-replica routing (e.g. session-affinity- driven routing in Cloudflare's case) is a different architectural choice that explicitly persists or shares cache state across replicas.
  • Cache hit rate is bounded by per-replica request locality. If a tenant's traffic is sharded across many replicas without affinity, each replica only sees a slice of the prefix-reuse opportunity. Cloudflare addresses this via x-session-affinity; Databricks' approach (per the disclosed material) is to accept the per-replica scope and rely on prefill-skip economics being large per hit.
  • No cache-write / cache-read billing differentiation feasible: with no persistent cache record, there's no fixed-cost-amortised asset to bill against. The Databricks post does not disclose a separate billing line for cached input tokens (compare Cloudflare's "discounted cached tokens" incentive on x-session-affinity).

Tenant-isolation mechanism (undisclosed)

The Databricks post claims tenant isolation but does not disclose the mechanism. Plausible designs include:

  • Process-per-tenant — strongest isolation, highest cost.
  • Cache-key prefix per tenant — single shared pool, tenant-scoped lookups. Requires correct lookup discipline.
  • VM / container boundary — typical for managed serving platforms.

The post is silent on which is used, so this concept page records the property claim (isolated + volatile + never-persisted) as the design surface, not the implementation primitive.

Relationship to other primitives

  • concepts/kv-cache — the cached object. Volatile-only isolation is a policy about KV-cache reuse, not about the KV-cache mechanism itself.
  • concepts/tenant-isolation — generalises the isolation property to any multi-tenant substrate. Volatile-only prompt- cache isolation is the prompt-cache-specialised instance.
  • concepts/implicit-prompt-caching — sibling concept. The volatile-only safety envelope is the security argument that makes implicit (default-on, no customer configuration) caching shippable on multi-tenant infrastructure.
  • concepts/session-affinity-prompt-caching — Cloudflare's contrasting design: cache is shared across replicas (via client-supplied affinity hints) and the threat model acknowledges a cross-replica cache surface. Databricks chose the opposite tradeoff: smaller per-replica cache scope, simpler threat model, default-on shippability.

Seen in

Last updated · 542 distilled / 1,571 read