PATTERN Cited by 1 source

Tiered state management — Memcache plus DB¶

Pattern¶

When a workflow or batch process produces both short-lived intermediate state (live for the duration of one job) and long-lived persistent state (live across many job runs), use two distinct storage tiers:

Distributed in-memory cache (Memcached / Redis) for the intra-job multi-step shared state — fast, ephemeral, evictable; no durability obligation.
Relational database for the persistent pre-computed reports — durable, queryable, long-retention.

The 2026-05-14 Atlassian post is the first canonical wiki home for this pattern in the pre-computation-framework context. Verbatim:

"Optimisation tools need to store pre-computed 'usage' data for many Jira entity types (fields, options, roles, schemes, etc.) at scale. This data is: Large (up to ~1M records per entity type, per tenant), Read-heavy (queried often in reports and UIs), Refreshed in batch (monthly/periodically, not in real time). Naively adding one dedicated table per entity doesn't scale ... So we used two layers: For short-lived, multi-step sharing inside a job are memcached. For persistent pre-computation, we store data in the Jira relational database using a small set of generic tables (polymorphic 'usage' tables) instead of one table per entity type."

(Source: sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat)

Problem¶

A pre-computation framework processing tenant data in batches needs storage for two distinct kinds of state with opposing requirements:

Intermediate state that scan-step workers share during a single job's lifecycle (Initialise → Scan-steps → Finalise). Examples: per-batch contributions to a running aggregate, deduplication bitmaps, in-flight scope partitions. Required: fast read/write, multi-worker visibility. Not required: durability past job completion.
Persistent reports that downstream consumers (admin UIs, remediation tools) read after the job completes, potentially many times before the next refresh. Required: durability, indexable querying, long retention. Not required: extreme write throughput (refreshes are batch / periodic).

Conflating the two — putting intermediate state in the DB, or report state in cache — costs:

Intermediate state in DB: high write throughput during scan-steps amplifies row-count and write-pressure on the DB unnecessarily; transient state pollutes the durable schema.
Report state in cache: report durability is lost on cache eviction; cache size becomes proportional to report retention rather than working-set size.

Solution¶

Split state into the two tiers along the durability boundary:

function scan_step(scope_id, batch) {
  // Intra-job state → Memcache (fast, ephemeral)
  for entity in batch {
    contribution = compute_contribution(entity, scope_id)
    memcache.upsert_merge(scope_id, contribution)
  }
}

function finalise(scope_id) {
  // Aggregate from Memcache, persist to DB
  intermediate = memcache.read_all(scope_id)
  report = build_report(intermediate)
  // Persistent state → DB (durable, queryable)
  db.upsert_polymorphic_usage(scope_id, report)
  // Optionally clear intermediate state
  memcache.expire(scope_id)
}

The Memcache tier holds state during the workflow; the DB tier holds state across workflows. The Finalise phase is the commit point that promotes (transformed) intermediate state to persistent state.

What goes where¶

State	Tier	Lifetime
Per-batch in-flight contributions	Memcache	Single job
Per-scope intermediate aggregates	Memcache	Single job
Deduplication bitmaps for the job	Memcache	Single job
Final per-scope reports	DB	Until next refresh
Last-refresh timestamps	DB	Permanent
Aggregate metrics for monitoring	DB	Permanent (or archived)

The discriminator: does anything outside this job need to read the data after the job finishes? If yes, DB. If no, Memcache.

Trade-offs¶

Property	Tiered (Memcache + DB)	DB-only	Cache-only
Intra-job read/write latency	Fast (Memcache)	Slow (DB round-trips)	Fast
Persistent report durability	Yes (DB)	Yes	No (eviction loss)
DB write pressure	Low (only Finalise commits)	High (every scan-step writes)	None
Cache eviction risk	Bounded (state lives for one job only)	None	High (any eviction loses report)
Operational complexity	Two systems to monitor	One	One (but unsuitable)
Cost per read of report	Low (DB index scan)	Low	Low if cached, otherwise lost

Implementation discipline¶

Memcache is a soft commitment. Eviction during a job is a possibility. Either tolerate the cost (job restart with idempotent scan-steps; see concepts/idempotent-thread-safe-scan-step) or size Memcache to comfortably fit the working set.
Memcache keys must be scoped to the job. Use (scope_id, job_id) as the key prefix so concurrent jobs on the same scope don't collide.
TTL the Memcache state past job completion. Even after Finalise, keep the intermediate state for some buffer (minutes to hours) to support inspection / debugging if Finalise itself fails.
DB schema chosen for read-heavy report query patterns. See patterns/polymorphic-usage-tables-for-multi-tenant-scale for the multi-tenant DB-design decision Atlassian pairs with this tiering.

Adjacent patterns¶

patterns/asynchronous-precomputed-report-batch-framework — the framework this storage architecture lives in.
patterns/polymorphic-usage-tables-for-multi-tenant-scale — the specific DB-design pattern for the persistent tier.
patterns/stale-while-revalidate-cache (when present) — adjacent: cache and origin store coexist with different freshness contracts.
Lambda architecture (analytical) — same general shape: speed layer (fast, ephemeral) + batch layer (durable, slower) with a serving layer that unifies reads. The pre-computation pattern is the workflow-batch-job-altitude cousin.

Adjacent at other altitudes¶

CPU cache + main memory — same boundary (fast/ephemeral vs slow/persistent) at the hardware altitude.
Redis + Postgres — the most common SaaS instance of this pattern; Redis Tower as session/cache, Postgres as source of truth. Memcached fills the same role for Atlassian.
OS page cache + disk — kernel-level instance of the same idea.
Cloudflare KV (eventual) + R2 / D1 (durable) — edge- altitude instance.

Seen in¶

sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat (2026-05-14, Atlassian) — first canonical wiki home for this pattern in the pre-computation-framework context. The Atlassian Pre-computation Framework uses Memcached for short-lived multi-step intra-job state and the Jira relational DB (with polymorphic usage tables) for persistent pre-computed reports — the canonical two-tier split, motivated by the dataset's large/read-heavy/batch-refreshed profile.