CONCEPT Cited by 1 source

In-memory tenant state¶

Definition¶

In-memory tenant state names the service property where tenant-specific data is loaded into RAM at process startup and served from memory, rather than fetched from a backing store on every request. The tenant's dataset lives in the service's heap; request handling is a read (or update) against in-memory structures.

Canonical wiki instance: AWS Architecture Blog's 2026-05-12 ad-serving platform (Source: sources/2026-05-12-aws-building-hybrid-multi-tenant-architecture-for-stateful-services). Verbatim:

"Our ad-serving platform is a stateful service that loads and maintains data in memory for each tenant rather than fetching it from a database on every request. This in-memory state improves performance but creates the noisy neighbor problem when tenants share infrastructure."

Why services choose this design¶

Three primary reasons surface across the wiki's stateful-service canon:

Request latency. Database or remote-cache lookups on the hot path add 1–10 ms per request. For services with p99 SLAs in the low-millisecond range (ad auctions, real-time personalisation, low-latency ranking), in-memory lookups are the only way to stay within budget.
Query shape complexity. Some queries (nearest-neighbour search over feature vectors, graph traversal, per-tenant model inference) don't map cleanly to database primitives; the service needs the data in a specific in-memory data structure (KD-tree, adjacency list, model weights, bloom filter).
Throughput. Reading from a local heap serves millions of operations per second per core; reading from a remote store hits network round-trip + serialisation costs that cap throughput orders of magnitude lower.

In the ad-serving canonical instance: "Our infrastructure handles millions of requests per second" — a throughput tier where in-memory state is structural, not optional.

The load-bearing consequence: heap becomes the tenant boundary¶

When a service keeps tenant state in memory, the process heap is the finest-granularity tenant boundary available. Sharing a heap across tenants means sharing:

Allocation pressure. One tenant's growth triggers GC that pauses every other tenant's request handlers (Java), or fragments the allocator (C++), or trips memory-compaction heuristics (Go).
OOM exposure. One tenant's dataset exceeding available memory triggers process-level OOM that kills every other tenant's active requests.
Cache effectiveness. Page-cache and CPU-cache lines are shared; one tenant's working set evicts another tenant's working set.
Deallocation contention. Deallocator locks, arena locks, and concurrent GC roots scale with the number of active tenants in a single process.

The canonical post makes this explicit:

"When two tenants share a cluster, their in-memory data competes for the same heap. A tenant with a large dataset can trigger out-of-memory conditions that affect its neighbors. This made shared-task and shared-cluster approaches challenging for our stateful workloads."

What this forces at the isolation layer¶

Services with in-memory tenant state cannot safely share a process, a JVM, or an EC2 instance's memory across tenants. This forces the isolation boundary to be at least at the cluster level — each tenant gets its own scheduler-managed cluster running on its own compute nodes. See concepts/cluster-level-tenant-isolation.

Task-per-tenant (multiple tasks on shared nodes) is still unsafe because multiple tenants' processes can co-locate on the same EC2 instance and compete for page cache + OS memory. Only cluster-per-tenant with dedicated node pools (or stronger: account-per-tenant) structurally eliminates the shared-memory blast radius.

Contrast with stateless services¶

Stateless services hold no tenant-specific data in memory between requests; every request fetches its working set from a backing store. The heap contains only the active request's data plus shared caches. For stateless services:

Tenants can share a JVM safely; GC pressure is request-local, not tenant-persistent.
Heap size is bounded by concurrent request count, not by tenant count or per-tenant dataset size.
OOM is a function of traffic shape, not tenant composition.
Row-level or JWT-claim-level tenant isolation is sufficient.

The structural gap between stateful and stateless services at the isolation-grain level is the isolation boundary has to move one level coarser when state is in memory.

Known patterns that use the property¶

Meta TAO — graph data cached in memory for a social-graph shard; per-shard fate-sharing rather than per-tenant.
systems/spann / systems/hnsw / vector search — per-customer vector indexes held in RAM for low-latency ANN lookup.
Per-tenant ML model serving — model weights loaded into GPU VRAM or CPU RAM at process startup; the process becomes fate-shared with the model's tenant.
Ad-serving — the canonical wiki instance; per-tenant campaign metadata, targeting predicates, and bidding state held in memory.
Rate limiters / per-tenant quotas — per-tenant counters in process memory; eviction triggers re-sync with a backing store.

What the property doesn't require¶

Durability. In-memory tenant state is typically backed by a persistent store (the canonical post mentions a "shared remote cache"); the in-memory copy is a read-through cache or a warm-started snapshot.
Consistency. Strong consistency is expensive across multiple stateful processes; most in-memory-state services accept eventual consistency with a bounded staleness window.
All data in memory. Many in-memory-state services keep only the hot path in memory; cold keys fall back to the backing store.

Anti-patterns¶

Loading all tenants' data into one shared process for performance — works until the first tenant's dataset blows up memory. The canonical post describes this as "challenging" even at 18 tenants; at scale it becomes structurally impossible.
Relying on OS memory limits (cgroups) alone for isolation — cgroups bound a process's total RAM, but a process-level OOM still kills the whole process including other tenants' state.
Treating in-memory state as a pure optimisation — once the service depends on in-memory latency, it's not easily reversible to a fetch-on-request architecture without SLA renegotiation.
Per-tenant heap limits inside a shared JVM — heap limits are per-process, not per-tenant; a per-tenant limit requires a per-tenant process, which is exactly cluster-level isolation.

Seen in¶

sources/2026-05-12-aws-building-hybrid-multi-tenant-architecture-for-stateful-services — canonical wiki instance. Ad-serving platform's in-memory tenant state is the explicit driver of the shift from account-per-tenant cellular architecture to cluster-per-tenant in shared accounts. The post's "Why we needed dedicated compute" section directly names the property as the reason shared-task and shared-cluster grains were rejected.

concepts/noisy-neighbor — the failure mode
concepts/cluster-level-tenant-isolation — the forced isolation grain
concepts/hybrid-multi-tenant-architecture — the architectural response
concepts/tenant-isolation — parent framing
concepts/statefulset-for-databases — the Kubernetes primitive that supports the same property at the DB layer
patterns/dedicated-ecs-cluster-per-tenant — the canonical pattern