PATTERN Cited by 1 source
Asynchronous pre-computed report batch framework¶
Pattern¶
A pre-computation framework is a platform-layer substrate that runs asynchronous workflow-orchestrated batch jobs to scan domain data for a given scope, produces a durable report, and lets downstream consumers (admin UIs, remediation tools, dashboards) read that report cheaply on-demand. The framework owns the workflow + task workers + batching + retries + observability dimension; the domain-specific tools (the report's authors) own the what to scan + what to compute + what counts as "unused" dimension. The two are decoupled by a stable contract that follows a three-phase shape:
- Initialise — bound the scope, clear or seed intermediate state, set the boundaries for a new job.
- Scan-steps — stream domain entities in batches; each batch processed by a worker that satisfies the idempotent + thread-safe + order-agnostic contract; framework parallelises and retries without coordination.
- Finalise — aggregate per-batch state into a durable report consumed by remediation flows and UIs.
(Source: sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat)
The 2026-05-14 Atlassian post is the first canonical wiki home for this pattern. Verbatim framing of the platform contract:
"It sits on top of our internal workflow orchestration engine, owns the workflow, task workers, batching, retries, and observability, and exposes a simple contract. Optimisation tools don't need to care about orchestrator details; they only plug in their logic."
Problem¶
Multi-tenant SaaS admin UIs frequently need to surface aggregated answers about per-tenant data — "which fields are unused in this space?", "how many active users in this org last 30 days?", "which schemes contain field X?" — that are too expensive to compute synchronously on every request:
- Per-tenant data may be very large (Atlassian: ~1M records per entity type per tenant).
- Computation may require scanning multiple data sources (work items + screens + fields + schemes).
- Many admins may request the same report concurrently; re-deriving on each request wastes compute.
- The answer changes infrequently relative to read demand.
A naïve solution — "compute everything on the fly in admin UIs" — "only delivers value if [it] works at the scale [of the platform]" and quickly fails at multi-tenant SaaS scale. The architectural shape that does work is pre-compute asynchronously, store durably, read cheaply.
Solution¶
Implement a platform-layer pre-computation framework with the three-phase contract:
1. Initialise¶
function initialise(scope_id) {
// Clear or seed intermediate state for this job
cache.clear(scope_id)
state.bind(scope_id, started_at = now())
// Bound work in time so a runaway scope can't hold the
// workflow indefinitely
set_deadline(scope_id, max_runtime)
}
The Initialise phase is the only phase that touches shared state in a coordinated way; from this point forward the framework is free to parallelise.
2. Scan-steps¶
The framework streams domain entities in batches. Each batch is processed by a worker that satisfies the idempotent + thread-safe + order-agnostic trio:
function scan_step(scope_id, batch) {
for entity in batch {
// Pure function of (entity, scope_id)
contribution = compute_contribution(entity, scope_id)
// Commutative aggregation into intermediate state
cache.upsert_merge(scope_id, contribution)
}
}
Because workers satisfy the trio, the orchestrator can:
- Parallelise scan-step workers across batches.
- Retry any batch on transient failure without duplication.
- Reorder batches under backpressure without affecting the final result.
This is the load-bearing simplification — the framework never has to coordinate scan-step execution. See concepts/idempotent-thread-safe-scan-step for the contract details.
3. Finalise¶
function finalise(scope_id) {
// Aggregate intermediate state into the durable report
intermediate = cache.read_all(scope_id)
report = build_report(intermediate)
db.upsert_polymorphic_usage(scope_id, report)
// Optionally clear intermediate state
cache.expire(scope_id)
}
The Finalise phase is the commit point of the workflow: once the report lands in the durable store, downstream consumers see consistent results. The intermediate state can be discarded; only the report persists.
Tool plugin contract¶
The framework exposes a stable interface that domain-specific tools implement. Per-tool concerns:
- Domain APIs to call (which Jira services to fetch from in scan-steps).
- What "unused" / "salient" means (the per-tool predicate).
- What recommendations to surface (the per-tool report schema).
The framework's concerns:
- Workflow orchestration (when each phase runs, which batches dispatch where, retry on failure).
- Batching (how many entities per batch, how many parallel workers).
- Retries (what counts as transient, how many attempts).
- Observability (per-batch metrics, per-job dashboards).
- Storage tier coordination (Memcache for intra-job state, DB for persistent reports).
Trade-offs¶
| Property | Pre-computation framework |
|---|---|
| Read latency | Fast (read durable report) |
| Write/refresh latency | Slow (async batch job; staleness window) |
| Compute cost | Amortised across many reads |
| Operational complexity | High (workflow + retries + observability) |
| Tool author burden | Low (plug in domain logic) |
| Staleness | Acknowledged (see patterns/prioritised-refresh-by-utilisation-threshold) |
The hidden cost is the staleness window. Reports become stale as the underlying data evolves. Mitigations: pair with the prioritised-refresh strategy and surface staleness state in the UI.
Adjacent patterns¶
- patterns/tiered-state-management-memcache-plus-db — the storage architecture under the framework. Intra-job state in Memcache; durable reports in the relational DB.
- patterns/polymorphic-usage-tables-for-multi-tenant-scale — the multi-tenant DB-design decision for the persistent layer.
- patterns/prioritised-refresh-by-utilisation-threshold — the refresh-policy lever that controls compute budget.
- concepts/control-plane-data-plane-separation — the framework / tool separation maps directly: framework is the control plane (orchestration, batching, retries), tools are the data plane (per-domain compute).
Adjacent systems¶
- MapReduce / Spark batch jobs — same general shape (init + parallel map + reduce), framework-driven, but typically batch-oriented over arbitrary data, not multi-tenant SaaS reporting.
- Temporal workflows — same durability + retry semantics at the per-workflow level; the pre-computation framework can be implemented on top of Temporal-shaped engines.
- Skipper — Airbnb's embedded workflow engine; structurally adjacent shape, with the durability commitment at the framework layer rather than around individual jobs.
Seen in¶
- sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat (2026-05-14, Atlassian) — first canonical wiki home. The Atlassian Pre-computation Framework is the canonical instance, computing per-space entity-usage reports across Jira Cloud's multi-tenant fleet. The Initialise → Scan-steps → Finalise contract is named verbatim; the load-bearing scan-step trio (idempotent + thread-safe + order-agnostic) is documented; the platform / tool separation is articulated as the architectural commitment that lets the platform improve independently.
Related¶
- systems/atlassian-precomputation-framework
- systems/atlassian-jira-optimisation-tools
- concepts/idempotent-thread-safe-scan-step
- concepts/control-plane-data-plane-separation
- patterns/tiered-state-management-memcache-plus-db
- patterns/polymorphic-usage-tables-for-multi-tenant-scale
- patterns/prioritised-refresh-by-utilisation-threshold