CONCEPT Cited by 1 source
In-memory tenant state¶
Definition¶
In-memory tenant state names the service property where tenant-specific data is loaded into RAM at process startup and served from memory, rather than fetched from a backing store on every request. The tenant's dataset lives in the service's heap; request handling is a read (or update) against in-memory structures.
Canonical wiki instance: AWS Architecture Blog's 2026-05-12 ad-serving platform (Source: sources/2026-05-12-aws-building-hybrid-multi-tenant-architecture-for-stateful-services). Verbatim:
"Our ad-serving platform is a stateful service that loads and maintains data in memory for each tenant rather than fetching it from a database on every request. This in-memory state improves performance but creates the noisy neighbor problem when tenants share infrastructure."
Why services choose this design¶
Three primary reasons surface across the wiki's stateful-service canon:
- Request latency. Database or remote-cache lookups on the hot path add 1–10 ms per request. For services with p99 SLAs in the low-millisecond range (ad auctions, real-time personalisation, low-latency ranking), in-memory lookups are the only way to stay within budget.
- Query shape complexity. Some queries (nearest-neighbour search over feature vectors, graph traversal, per-tenant model inference) don't map cleanly to database primitives; the service needs the data in a specific in-memory data structure (KD-tree, adjacency list, model weights, bloom filter).
- Throughput. Reading from a local heap serves millions of operations per second per core; reading from a remote store hits network round-trip + serialisation costs that cap throughput orders of magnitude lower.
In the ad-serving canonical instance: "Our infrastructure handles millions of requests per second" — a throughput tier where in-memory state is structural, not optional.
The load-bearing consequence: heap becomes the tenant boundary¶
When a service keeps tenant state in memory, the process heap is the finest-granularity tenant boundary available. Sharing a heap across tenants means sharing:
- Allocation pressure. One tenant's growth triggers GC that pauses every other tenant's request handlers (Java), or fragments the allocator (C++), or trips memory-compaction heuristics (Go).
- OOM exposure. One tenant's dataset exceeding available memory triggers process-level OOM that kills every other tenant's active requests.
- Cache effectiveness. Page-cache and CPU-cache lines are shared; one tenant's working set evicts another tenant's working set.
- Deallocation contention. Deallocator locks, arena locks, and concurrent GC roots scale with the number of active tenants in a single process.
The canonical post makes this explicit:
"When two tenants share a cluster, their in-memory data competes for the same heap. A tenant with a large dataset can trigger out-of-memory conditions that affect its neighbors. This made shared-task and shared-cluster approaches challenging for our stateful workloads."
What this forces at the isolation layer¶
Services with in-memory tenant state cannot safely share a process, a JVM, or an EC2 instance's memory across tenants. This forces the isolation boundary to be at least at the cluster level — each tenant gets its own scheduler-managed cluster running on its own compute nodes. See concepts/cluster-level-tenant-isolation.
Task-per-tenant (multiple tasks on shared nodes) is still unsafe because multiple tenants' processes can co-locate on the same EC2 instance and compete for page cache + OS memory. Only cluster-per-tenant with dedicated node pools (or stronger: account-per-tenant) structurally eliminates the shared-memory blast radius.
Contrast with stateless services¶
Stateless services hold no tenant-specific data in memory between requests; every request fetches its working set from a backing store. The heap contains only the active request's data plus shared caches. For stateless services:
- Tenants can share a JVM safely; GC pressure is request-local, not tenant-persistent.
- Heap size is bounded by concurrent request count, not by tenant count or per-tenant dataset size.
- OOM is a function of traffic shape, not tenant composition.
- Row-level or JWT-claim-level tenant isolation is sufficient.
The structural gap between stateful and stateless services at the isolation-grain level is the isolation boundary has to move one level coarser when state is in memory.
Known patterns that use the property¶
- Meta TAO — graph data cached in memory for a social-graph shard; per-shard fate-sharing rather than per-tenant.
- systems/spann / systems/hnsw / vector search — per-customer vector indexes held in RAM for low-latency ANN lookup.
- Per-tenant ML model serving — model weights loaded into GPU VRAM or CPU RAM at process startup; the process becomes fate-shared with the model's tenant.
- Ad-serving — the canonical wiki instance; per-tenant campaign metadata, targeting predicates, and bidding state held in memory.
- Rate limiters / per-tenant quotas — per-tenant counters in process memory; eviction triggers re-sync with a backing store.
What the property doesn't require¶
- Durability. In-memory tenant state is typically backed by a persistent store (the canonical post mentions a "shared remote cache"); the in-memory copy is a read-through cache or a warm-started snapshot.
- Consistency. Strong consistency is expensive across multiple stateful processes; most in-memory-state services accept eventual consistency with a bounded staleness window.
- All data in memory. Many in-memory-state services keep only the hot path in memory; cold keys fall back to the backing store.
Anti-patterns¶
- Loading all tenants' data into one shared process for performance — works until the first tenant's dataset blows up memory. The canonical post describes this as "challenging" even at 18 tenants; at scale it becomes structurally impossible.
- Relying on OS memory limits (cgroups) alone for isolation — cgroups bound a process's total RAM, but a process-level OOM still kills the whole process including other tenants' state.
- Treating in-memory state as a pure optimisation — once the service depends on in-memory latency, it's not easily reversible to a fetch-on-request architecture without SLA renegotiation.
- Per-tenant heap limits inside a shared JVM — heap limits are per-process, not per-tenant; a per-tenant limit requires a per-tenant process, which is exactly cluster-level isolation.
Seen in¶
- sources/2026-05-12-aws-building-hybrid-multi-tenant-architecture-for-stateful-services — canonical wiki instance. Ad-serving platform's in-memory tenant state is the explicit driver of the shift from account-per-tenant cellular architecture to cluster-per-tenant in shared accounts. The post's "Why we needed dedicated compute" section directly names the property as the reason shared-task and shared-cluster grains were rejected.
Related¶
- concepts/noisy-neighbor — the failure mode
- concepts/cluster-level-tenant-isolation — the forced isolation grain
- concepts/hybrid-multi-tenant-architecture — the architectural response
- concepts/tenant-isolation — parent framing
- concepts/statefulset-for-databases — the Kubernetes primitive that supports the same property at the DB layer
- patterns/dedicated-ecs-cluster-per-tenant — the canonical pattern