PATTERN Cited by 1 source
Tiered state management — Memcache plus DB¶
Pattern¶
When a workflow or batch process produces both short-lived intermediate state (live for the duration of one job) and long-lived persistent state (live across many job runs), use two distinct storage tiers:
- Distributed in-memory cache (Memcached / Redis) for the intra-job multi-step shared state — fast, ephemeral, evictable; no durability obligation.
- Relational database for the persistent pre-computed reports — durable, queryable, long-retention.
The 2026-05-14 Atlassian post is the first canonical wiki home for this pattern in the pre-computation-framework context. Verbatim:
"Optimisation tools need to store pre-computed 'usage' data for many Jira entity types (fields, options, roles, schemes, etc.) at scale. This data is: Large (up to ~1M records per entity type, per tenant), Read-heavy (queried often in reports and UIs), Refreshed in batch (monthly/periodically, not in real time). Naively adding one dedicated table per entity doesn't scale ... So we used two layers: For short-lived, multi-step sharing inside a job are memcached. For persistent pre-computation, we store data in the Jira relational database using a small set of generic tables (polymorphic 'usage' tables) instead of one table per entity type."
(Source: sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat)
Problem¶
A pre-computation framework processing tenant data in batches needs storage for two distinct kinds of state with opposing requirements:
- Intermediate state that scan-step workers share during a single job's lifecycle (Initialise → Scan-steps → Finalise). Examples: per-batch contributions to a running aggregate, deduplication bitmaps, in-flight scope partitions. Required: fast read/write, multi-worker visibility. Not required: durability past job completion.
- Persistent reports that downstream consumers (admin UIs, remediation tools) read after the job completes, potentially many times before the next refresh. Required: durability, indexable querying, long retention. Not required: extreme write throughput (refreshes are batch / periodic).
Conflating the two — putting intermediate state in the DB, or report state in cache — costs:
- Intermediate state in DB: high write throughput during scan-steps amplifies row-count and write-pressure on the DB unnecessarily; transient state pollutes the durable schema.
- Report state in cache: report durability is lost on cache eviction; cache size becomes proportional to report retention rather than working-set size.
Solution¶
Split state into the two tiers along the durability boundary:
function scan_step(scope_id, batch) {
// Intra-job state → Memcache (fast, ephemeral)
for entity in batch {
contribution = compute_contribution(entity, scope_id)
memcache.upsert_merge(scope_id, contribution)
}
}
function finalise(scope_id) {
// Aggregate from Memcache, persist to DB
intermediate = memcache.read_all(scope_id)
report = build_report(intermediate)
// Persistent state → DB (durable, queryable)
db.upsert_polymorphic_usage(scope_id, report)
// Optionally clear intermediate state
memcache.expire(scope_id)
}
The Memcache tier holds state during the workflow; the DB tier holds state across workflows. The Finalise phase is the commit point that promotes (transformed) intermediate state to persistent state.
What goes where¶
| State | Tier | Lifetime |
|---|---|---|
| Per-batch in-flight contributions | Memcache | Single job |
| Per-scope intermediate aggregates | Memcache | Single job |
| Deduplication bitmaps for the job | Memcache | Single job |
| Final per-scope reports | DB | Until next refresh |
| Last-refresh timestamps | DB | Permanent |
| Aggregate metrics for monitoring | DB | Permanent (or archived) |
The discriminator: does anything outside this job need to read the data after the job finishes? If yes, DB. If no, Memcache.
Trade-offs¶
| Property | Tiered (Memcache + DB) | DB-only | Cache-only |
|---|---|---|---|
| Intra-job read/write latency | Fast (Memcache) | Slow (DB round-trips) | Fast |
| Persistent report durability | Yes (DB) | Yes | No (eviction loss) |
| DB write pressure | Low (only Finalise commits) | High (every scan-step writes) | None |
| Cache eviction risk | Bounded (state lives for one job only) | None | High (any eviction loses report) |
| Operational complexity | Two systems to monitor | One | One (but unsuitable) |
| Cost per read of report | Low (DB index scan) | Low | Low if cached, otherwise lost |
Implementation discipline¶
- Memcache is a soft commitment. Eviction during a job is a possibility. Either tolerate the cost (job restart with idempotent scan-steps; see concepts/idempotent-thread-safe-scan-step) or size Memcache to comfortably fit the working set.
- Memcache keys must be scoped to the job. Use
(scope_id, job_id)as the key prefix so concurrent jobs on the same scope don't collide. - TTL the Memcache state past job completion. Even after Finalise, keep the intermediate state for some buffer (minutes to hours) to support inspection / debugging if Finalise itself fails.
- DB schema chosen for read-heavy report query patterns. See patterns/polymorphic-usage-tables-for-multi-tenant-scale for the multi-tenant DB-design decision Atlassian pairs with this tiering.
Adjacent patterns¶
- patterns/asynchronous-precomputed-report-batch-framework — the framework this storage architecture lives in.
- patterns/polymorphic-usage-tables-for-multi-tenant-scale — the specific DB-design pattern for the persistent tier.
- patterns/stale-while-revalidate-cache (when present) — adjacent: cache and origin store coexist with different freshness contracts.
- Lambda architecture (analytical) — same general shape: speed layer (fast, ephemeral) + batch layer (durable, slower) with a serving layer that unifies reads. The pre-computation pattern is the workflow-batch-job-altitude cousin.
Adjacent at other altitudes¶
- CPU cache + main memory — same boundary (fast/ephemeral vs slow/persistent) at the hardware altitude.
- Redis + Postgres — the most common SaaS instance of this pattern; Redis Tower as session/cache, Postgres as source of truth. Memcached fills the same role for Atlassian.
- OS page cache + disk — kernel-level instance of the same idea.
- Cloudflare KV (eventual) + R2 / D1 (durable) — edge- altitude instance.
Seen in¶
- sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat (2026-05-14, Atlassian) — first canonical wiki home for this pattern in the pre-computation-framework context. The Atlassian Pre-computation Framework uses Memcached for short-lived multi-step intra-job state and the Jira relational DB (with polymorphic usage tables) for persistent pre-computed reports — the canonical two-tier split, motivated by the dataset's large/read-heavy/batch-refreshed profile.