Skip to content

PATTERN Cited by 1 source

Asynchronous pre-computed report batch framework

Pattern

A pre-computation framework is a platform-layer substrate that runs asynchronous workflow-orchestrated batch jobs to scan domain data for a given scope, produces a durable report, and lets downstream consumers (admin UIs, remediation tools, dashboards) read that report cheaply on-demand. The framework owns the workflow + task workers + batching + retries + observability dimension; the domain-specific tools (the report's authors) own the what to scan + what to compute + what counts as "unused" dimension. The two are decoupled by a stable contract that follows a three-phase shape:

  1. Initialise — bound the scope, clear or seed intermediate state, set the boundaries for a new job.
  2. Scan-steps — stream domain entities in batches; each batch processed by a worker that satisfies the idempotent + thread-safe + order-agnostic contract; framework parallelises and retries without coordination.
  3. Finalise — aggregate per-batch state into a durable report consumed by remediation flows and UIs.

(Source: sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat)

The 2026-05-14 Atlassian post is the first canonical wiki home for this pattern. Verbatim framing of the platform contract:

"It sits on top of our internal workflow orchestration engine, owns the workflow, task workers, batching, retries, and observability, and exposes a simple contract. Optimisation tools don't need to care about orchestrator details; they only plug in their logic."

Problem

Multi-tenant SaaS admin UIs frequently need to surface aggregated answers about per-tenant data"which fields are unused in this space?", "how many active users in this org last 30 days?", "which schemes contain field X?" — that are too expensive to compute synchronously on every request:

  • Per-tenant data may be very large (Atlassian: ~1M records per entity type per tenant).
  • Computation may require scanning multiple data sources (work items + screens + fields + schemes).
  • Many admins may request the same report concurrently; re-deriving on each request wastes compute.
  • The answer changes infrequently relative to read demand.

A naïve solution — "compute everything on the fly in admin UIs""only delivers value if [it] works at the scale [of the platform]" and quickly fails at multi-tenant SaaS scale. The architectural shape that does work is pre-compute asynchronously, store durably, read cheaply.

Solution

Implement a platform-layer pre-computation framework with the three-phase contract:

1. Initialise

function initialise(scope_id) {
  // Clear or seed intermediate state for this job
  cache.clear(scope_id)
  state.bind(scope_id, started_at = now())
  // Bound work in time so a runaway scope can't hold the
  // workflow indefinitely
  set_deadline(scope_id, max_runtime)
}

The Initialise phase is the only phase that touches shared state in a coordinated way; from this point forward the framework is free to parallelise.

2. Scan-steps

The framework streams domain entities in batches. Each batch is processed by a worker that satisfies the idempotent + thread-safe + order-agnostic trio:

function scan_step(scope_id, batch) {
  for entity in batch {
    // Pure function of (entity, scope_id)
    contribution = compute_contribution(entity, scope_id)
    // Commutative aggregation into intermediate state
    cache.upsert_merge(scope_id, contribution)
  }
}

Because workers satisfy the trio, the orchestrator can:

  • Parallelise scan-step workers across batches.
  • Retry any batch on transient failure without duplication.
  • Reorder batches under backpressure without affecting the final result.

This is the load-bearing simplification — the framework never has to coordinate scan-step execution. See concepts/idempotent-thread-safe-scan-step for the contract details.

3. Finalise

function finalise(scope_id) {
  // Aggregate intermediate state into the durable report
  intermediate = cache.read_all(scope_id)
  report = build_report(intermediate)
  db.upsert_polymorphic_usage(scope_id, report)
  // Optionally clear intermediate state
  cache.expire(scope_id)
}

The Finalise phase is the commit point of the workflow: once the report lands in the durable store, downstream consumers see consistent results. The intermediate state can be discarded; only the report persists.

Tool plugin contract

The framework exposes a stable interface that domain-specific tools implement. Per-tool concerns:

  • Domain APIs to call (which Jira services to fetch from in scan-steps).
  • What "unused" / "salient" means (the per-tool predicate).
  • What recommendations to surface (the per-tool report schema).

The framework's concerns:

  • Workflow orchestration (when each phase runs, which batches dispatch where, retry on failure).
  • Batching (how many entities per batch, how many parallel workers).
  • Retries (what counts as transient, how many attempts).
  • Observability (per-batch metrics, per-job dashboards).
  • Storage tier coordination (Memcache for intra-job state, DB for persistent reports).

Trade-offs

Property Pre-computation framework
Read latency Fast (read durable report)
Write/refresh latency Slow (async batch job; staleness window)
Compute cost Amortised across many reads
Operational complexity High (workflow + retries + observability)
Tool author burden Low (plug in domain logic)
Staleness Acknowledged (see patterns/prioritised-refresh-by-utilisation-threshold)

The hidden cost is the staleness window. Reports become stale as the underlying data evolves. Mitigations: pair with the prioritised-refresh strategy and surface staleness state in the UI.

Adjacent patterns

Adjacent systems

  • MapReduce / Spark batch jobs — same general shape (init + parallel map + reduce), framework-driven, but typically batch-oriented over arbitrary data, not multi-tenant SaaS reporting.
  • Temporal workflows — same durability + retry semantics at the per-workflow level; the pre-computation framework can be implemented on top of Temporal-shaped engines.
  • Skipper — Airbnb's embedded workflow engine; structurally adjacent shape, with the durability commitment at the framework layer rather than around individual jobs.

Seen in

Last updated · 542 distilled / 1,571 read