Skip to content

PATTERN Cited by 2 sources

Independent scaling tiers

Problem

A single-tier system has one scaling lever — add more servers. Every server handles every concern: client sessions, cache memory, change-stream processing, schema management. Growth in any axis forces fleet growth, and the fleet's bottleneck is the most stressed concern per server. This produces:

  • Uneven resource utilisation. The fleet grows for memory but each node is CPU-idle (or vice versa).
  • Coupled fan-in / fan-out. More servers = more subscribers to every change stream = worse per-node fan-in. See Figma LiveGraph's pre-100x fan-out failure mode (Source: sources/2026-04-21-figma-keeping-it-100x-with-real-time-data-at-scale).
  • Deploy stampede risk. A fleet-wide deploy wipes every node's in-memory state at once → thundering herd on downstream.
  • Coupled blast radius. A transient failure in any concern degrades every concern on every node.

Shape

Split the system into tiers that each scale on a different axis, each deployed and operated independently. Canonical three-tier split for cache-fronted real-time data services:

  1. Ingress / session tier (Figma's edge; also FigCache frontend) — scales with client-session count and view-request rate.
  2. Cache tier — scales with active-query count and hot-data footprint.
  3. Change-processing tier (Figma's invalidator; FigCache's backend Redis fleet) — scales with upstream change rate (and, for DB-backed caches, with DB shard count).

Each tier is sharded on its own hash axis — the ingress tier by client session, the cache by query hash, the invalidator by DB shard — so that invalidations / requests flow point-to-point (shard→shard→shard), never fleet-broadcast.

Why it works

  • No single fleet growth target. Client-fleet scaling doesn't require cache scaling; DB scale-out doesn't require client-fleet scaling.
  • Cache deploy decoupled from ingress deploy. When the ingress tier redeploys (app-code update), the cache tier stays up, its in-memory state survives, and the next client reconnect hits a warm cache. Eliminates the deploy-thundering-herd failure mode directly.
  • Per-tier failure isolation. A tier that OOMs / stalls degrades only its axis; the others keep serving.
  • Per-tier technology choice. The cache tier can run in-memory Go with custom data structures; the ingress tier can be stateless-by-contract; the invalidator can be a pure streaming-function. Nothing forces a common runtime profile.

Pre-conditions

  • Shared state can be lifted to a dedicated tier. Usually requires re-architecting in-process data structures into a remote-accessed cache — not free.
  • Scaling axes are actually different. If all three axes move together, the split is overhead without benefit.
  • Cross-tier traffic cost is tolerable. Going from in-process cache → RPC-to-cache-tier is a latency hit; Figma absorbs it by batching via view→queries expansion on the edge.
  • Operational sophistication — three fleets instead of one; needs per-tier runbooks, deploys, capacity planning.

Canonical Figma instances

LiveGraph 100x

(Source: sources/2026-04-21-figma-keeping-it-100x-with-real-time-data-at-scale)

Tier Scales with Scaling input
Edge Client sessions +5× view requests/year
Cache Active queries Hot-data working set
Invalidator DB shards + update rate +12 vertical shards → horizontal

Each can be scaled up / down without the others. The post states this as the structural win over the pre-100x design:

"Notably, the invalidator service can scale as the database team adds new vertical and horizontal shards, while the edge and cache services can scale to handle increasing user traffic and the number of active queries. And scaling any single service doesn't disproportionately increase the fan-in or fan-out of messages over the network."

FigCache

(Source: sources/2026-04-21-figma-figcache-next-generation-data-caching-platform)

  • FigCache frontend tier — scales with client fleet; holds RESP connections.
  • ElastiCache Redis fleet — scales with data volume + hot-key load.
  • ResPC routing layer — internal to FigCache, scales with command-dispatch rate.

Same underlying shape: connection multiplexing lets client fleet scale-out without multiplying Redis connection load — the Redis tier scales on its own axis.

Adjacent tradition

  • concepts/control-plane-data-plane-separation — the most common two-tier instance of this pattern. Control plane scales with policy / config churn; data plane scales with request rate.
  • Envoy + xDS control plane — data plane Envoys scale with traffic, xDS server scales with config churn.
  • Kafka brokers + Connect workers + Streams applications — each scales independently.
  • Stripe's "microservice tier" split from its original Ruby monolith — each tier scales with a different request-mix axis.

When a single tier wins

  • Small scale, bounded growth — overhead of three fleets isn't paid back.
  • All axes move together — e.g. each client has its own cache footprint, its own upstream change rate, and its own session weight; splitting doesn't decouple anything real.
  • Low-latency constraints incompatible with RPC hops — single- tier fits strict p99 budgets that multi-tier can't (plus network).
  • Small engineering team — can't operate multiple distinct services.

Anti-patterns it replaces

  • "One fleet scales everything" — grow the server count for memory and take the CPU under-utilisation hit. LiveGraph pre-100x was here.
  • "In-process cache on every box" — cache fragmented across fleet, deploy wipes cache, cold-start thundering herd. Mitigation options are all bounded; structural fix is to lift cache out.

Seen in

Last updated · 200 distilled / 1,178 read