SYSTEM Cited by 1 source

LMCache¶

Overview¶

LMCache (github.com/LMCache/LMCache) is an open-source KV-cache layer for LLM-serving engines that provides cluster-wide prefix / KV reuse across nodes and across inference engines. When paired with transport + persistence substrates like Mooncake Transfer Engine / Mooncake Store, LMCache extends vLLM-style PagedAttention KV cache beyond a single node.

Cloudflare Workers AI uses LMCache (or alternatively SGLang HiCache) as the software layer that exposes the cluster-shared cache to the serving engine: "When paired with LMCache or SGLang HiCache, the cache is shared across all nodes in the cluster, allowing a prefill node to identify and re-use a cache from a previous request that was originally pre-filled on a different node." (Source: sources/2026-04-16-cloudflare-building-the-foundation-for-running-extra-large-language-models)

Role in the Workers AI stack¶

The layer split in Workers AI's multi-GPU / multi-node serving:

Layer	Component	Role
Serving engine	Infire / vLLM / SGLang	per-replica forward-pass execution, per-replica KV management
Cross-node cache layer	LMCache / SGLang HiCache	lookup and share KV across replicas
Transport	Mooncake Transfer Engine	RDMA KV block transfer
Persistence	Mooncake Store	NVMe cold tier

Without a cache layer like LMCache, a prefill node has no mechanism to discover that another node previously pre-filled the same prefix — the cross-node sharing window is invisible. LMCache exposes the lookup + policy API that turns per-node KV pools into a cluster-wide hit surface.

Consequence: session-aware routing becomes optional within a cluster¶

"This eliminates the need for session aware routing within a cluster and allows us to load balance the traffic much more evenly."

With LMCache (+ Mooncake) in place, two same-prefix requests can land on different nodes and still hit shared cache — so the load balancer is free to optimise for node load, not session affinity. x-session-affinity hints still matter across clusters / regions, but not within.

KV cache — what's being shared.
Mooncake Transfer Engine — paired RDMA transport.
Mooncake Store — paired NVMe cold tier.
SGLang HiCache — sibling cache layer; the Cloudflare post names LMCache or SGLang HiCache as options.
vLLM PagedAttention — the per-node KV-cache management LMCache extends cluster-wide.

Caveats¶

Post mentions LMCache only in passing — no numbers, no integration detail, no per-lookup latency characterisation.
No disclosure of which option Cloudflare chose — LMCache vs SGLang HiCache — or whether both are in use on different paths.
Eviction policy / capacity / consistency model not discussed for the cluster-wide view.

Seen in¶

sources/2026-04-16-cloudflare-building-the-foundation-for-running-extra-large-language-models — named (with SGLang HiCache as alternative) as the software layer that exposes cluster-wide shared KV cache to the serving engine.