PATTERN Cited by 1 source

Asynchronous pre-computed report batch framework¶

Pattern¶

A pre-computation framework is a platform-layer substrate that runs asynchronous workflow-orchestrated batch jobs to scan domain data for a given scope, produces a durable report, and lets downstream consumers (admin UIs, remediation tools, dashboards) read that report cheaply on-demand. The framework owns the workflow + task workers + batching + retries + observability dimension; the domain-specific tools (the report's authors) own the what to scan + what to compute + what counts as "unused" dimension. The two are decoupled by a stable contract that follows a three-phase shape:

Initialise — bound the scope, clear or seed intermediate state, set the boundaries for a new job.
Scan-steps — stream domain entities in batches; each batch processed by a worker that satisfies the idempotent + thread-safe + order-agnostic contract; framework parallelises and retries without coordination.
Finalise — aggregate per-batch state into a durable report consumed by remediation flows and UIs.

(Source: sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat)

The 2026-05-14 Atlassian post is the first canonical wiki home for this pattern. Verbatim framing of the platform contract:

"It sits on top of our internal workflow orchestration engine, owns the workflow, task workers, batching, retries, and observability, and exposes a simple contract. Optimisation tools don't need to care about orchestrator details; they only plug in their logic."

Problem¶

Multi-tenant SaaS admin UIs frequently need to surface aggregated answers about per-tenant data — "which fields are unused in this space?", "how many active users in this org last 30 days?", "which schemes contain field X?" — that are too expensive to compute synchronously on every request:

Per-tenant data may be very large (Atlassian: ~1M records per entity type per tenant).
Computation may require scanning multiple data sources (work items + screens + fields + schemes).
Many admins may request the same report concurrently; re-deriving on each request wastes compute.
The answer changes infrequently relative to read demand.

A naïve solution — "compute everything on the fly in admin UIs" — "only delivers value if [it] works at the scale [of the platform]" and quickly fails at multi-tenant SaaS scale. The architectural shape that does work is pre-compute asynchronously, store durably, read cheaply.

Solution¶

Implement a platform-layer pre-computation framework with the three-phase contract:

1. Initialise¶

function initialise(scope_id) {
  // Clear or seed intermediate state for this job
  cache.clear(scope_id)
  state.bind(scope_id, started_at = now())
  // Bound work in time so a runaway scope can't hold the
  // workflow indefinitely
  set_deadline(scope_id, max_runtime)
}

The Initialise phase is the only phase that touches shared state in a coordinated way; from this point forward the framework is free to parallelise.

2. Scan-steps¶

The framework streams domain entities in batches. Each batch is processed by a worker that satisfies the idempotent + thread-safe + order-agnostic trio:

function scan_step(scope_id, batch) {
  for entity in batch {
    // Pure function of (entity, scope_id)
    contribution = compute_contribution(entity, scope_id)
    // Commutative aggregation into intermediate state
    cache.upsert_merge(scope_id, contribution)
  }
}

Because workers satisfy the trio, the orchestrator can:

Parallelise scan-step workers across batches.
Retry any batch on transient failure without duplication.
Reorder batches under backpressure without affecting the final result.

This is the load-bearing simplification — the framework never has to coordinate scan-step execution. See concepts/idempotent-thread-safe-scan-step for the contract details.

3. Finalise¶

function finalise(scope_id) {
  // Aggregate intermediate state into the durable report
  intermediate = cache.read_all(scope_id)
  report = build_report(intermediate)
  db.upsert_polymorphic_usage(scope_id, report)
  // Optionally clear intermediate state
  cache.expire(scope_id)
}

The Finalise phase is the commit point of the workflow: once the report lands in the durable store, downstream consumers see consistent results. The intermediate state can be discarded; only the report persists.

Tool plugin contract¶

The framework exposes a stable interface that domain-specific tools implement. Per-tool concerns:

Domain APIs to call (which Jira services to fetch from in scan-steps).
What "unused" / "salient" means (the per-tool predicate).
What recommendations to surface (the per-tool report schema).

The framework's concerns:

Workflow orchestration (when each phase runs, which batches dispatch where, retry on failure).
Batching (how many entities per batch, how many parallel workers).
Retries (what counts as transient, how many attempts).
Observability (per-batch metrics, per-job dashboards).
Storage tier coordination (Memcache for intra-job state, DB for persistent reports).

Trade-offs¶

Property	Pre-computation framework
Read latency	Fast (read durable report)
Write/refresh latency	Slow (async batch job; staleness window)
Compute cost	Amortised across many reads
Operational complexity	High (workflow + retries + observability)
Tool author burden	Low (plug in domain logic)
Staleness	Acknowledged (see patterns/prioritised-refresh-by-utilisation-threshold)

The hidden cost is the staleness window. Reports become stale as the underlying data evolves. Mitigations: pair with the prioritised-refresh strategy and surface staleness state in the UI.

Adjacent patterns¶

patterns/tiered-state-management-memcache-plus-db — the storage architecture under the framework. Intra-job state in Memcache; durable reports in the relational DB.
patterns/polymorphic-usage-tables-for-multi-tenant-scale — the multi-tenant DB-design decision for the persistent layer.
patterns/prioritised-refresh-by-utilisation-threshold — the refresh-policy lever that controls compute budget.
concepts/control-plane-data-plane-separation — the framework / tool separation maps directly: framework is the control plane (orchestration, batching, retries), tools are the data plane (per-domain compute).

Adjacent systems¶

MapReduce / Spark batch jobs — same general shape (init + parallel map + reduce), framework-driven, but typically batch-oriented over arbitrary data, not multi-tenant SaaS reporting.
Temporal workflows — same durability + retry semantics at the per-workflow level; the pre-computation framework can be implemented on top of Temporal-shaped engines.
Skipper — Airbnb's embedded workflow engine; structurally adjacent shape, with the durability commitment at the framework layer rather than around individual jobs.

Seen in¶

sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat (2026-05-14, Atlassian) — first canonical wiki home. The Atlassian Pre-computation Framework is the canonical instance, computing per-space entity-usage reports across Jira Cloud's multi-tenant fleet. The Initialise → Scan-steps → Finalise contract is named verbatim; the load-bearing scan-step trio (idempotent + thread-safe + order-agnostic) is documented; the platform / tool separation is articulated as the architectural commitment that lets the platform improve independently.