CONCEPT Cited by 1 source
Idempotent thread-safe order-agnostic scan-step¶
Definition¶
An idempotent thread-safe order-agnostic scan-step is the load-bearing worker contract that batch-processing frameworks require of their per-batch workers, so that the framework can parallelise and retry batches without coordination. The trio of properties is:
- Idempotent — running the same batch twice produces the same final state. Retries are safe.
- Thread-safe — multiple batches can run concurrently without corrupting shared state. Parallelism is safe.
- Order-agnostic — batches can arrive in any order without affecting the final report. Reordering for backpressure or retry is safe.
The 2026-05-14 Atlassian post is the first canonical wiki home for the trio as an explicit framework contract. Verbatim:
"Scan-steps – the framework streams Jira entities (work items, screens, fields, and so on) in batches. Tool implementations must be thread-safe, idempotent, and order-agnostic, so the framework is free to parallelise and retry without coordination."
(Source: sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat)
What each property buys the framework¶
| Property | What it enables | What violations cost |
|---|---|---|
| Thread-safe | Free parallel execution of scan-steps across batches | Locks, serial throughput, data races |
| Idempotent | Free retry on transient failure without coordination | Duplicate-detection state, retry locks |
| Order-agnostic | Free batch interleaving / reordering for backpressure | Sequence-tracking state, replay buffers |
When workers satisfy the trio, the orchestrator can treat each batch as an independent retryable unit and coordinate nothing beyond batch dispatch. When workers don't, the orchestrator must add coordination — at-most-once dispatch, order-preserving queues, distributed locks — each of which adds latency and reduces throughput.
How the three properties interact¶
The trio is not redundant. Each addresses a distinct class of failure:
- Thread-safe alone isn't enough — without idempotence, a worker that crashes mid-batch and is retried can double-count or partially-update state.
- Idempotent alone isn't enough — without thread-safety, two concurrent batches can interleave updates such that the combined effect isn't equivalent to running each once.
- Idempotent + thread-safe alone isn't enough — without order-agnosticism, a worker that depends on earlier-batch state requires sequence-preserving delivery, defeating parallelism.
A worker satisfying all three is essentially a pure function over its batch input, modulo monotonic side-effects (e.g. counter increment by exactly the input batch's contribution, or upsert-by-key with a deterministic value).
Adjacent contracts at other altitudes¶
The same trio (or close approximations) appears in:
- MapReduce mappers — pure-function mappers operating on independent input splits; the idempotent + thread-safe + order-agnostic discipline is implicit in the framework contract.
- Stream-processing operators (Flink, Kafka Streams) — operators that satisfy the trio enable exactly-once semantics with at-least-once delivery, by letting the framework retry without coordination.
- Idempotent HTTP handlers — RFC-7231 idempotent methods (PUT, DELETE) are the network-protocol-altitude analogue.
- Workflow activities in Temporal / Cadence — activity workers are explicitly required to be retry-safe; thread-safe and order-agnostic apply when activities run concurrently.
- CRDT operations — convergent replicated data types satisfy the trio at the data-structure altitude: operations are commutative (order-agnostic), idempotent (retry-safe), and concurrent-safe (thread-safe). The scan-step contract is the batch-processing-shaped cousin of CRDT operations.
Implementation discipline¶
To satisfy the trio, scan-step authors typically:
- Avoid worker-local mutable state across batches — any state worth keeping is written to durable storage with deterministic upsert semantics.
- Use upsert-by-key, not unconditional insert — duplicate batches don't produce duplicate rows.
- Avoid non-deterministic computations — random numbers, timestamps captured during processing, external API calls without idempotency keys all break retry-determinism.
- Use commutative aggregations — counter += batch_size, set-add, max(), min() — operations whose result doesn't depend on the order they're applied.
- Surface per-batch effect as a deterministic function
of (batch_input, scope_id) — the same
(input, scope)pair always produces the same effect, regardless of how many times or in what order it runs.
Trade-offs¶
The discipline isn't free:
- Some computations don't fit naturally — anything inherently sequential (e.g. "the i-th batch's output is the (i-1)-th batch's input") can't be made order-agnostic without extra state.
- Idempotence requires deduplication keys — for
inserts, this means stable IDs (often
(scope_id, entity_id)composites); generating these can be non-trivial. - Order-agnosticism forbids cumulative state machines — a scan-step that updates a state-machine in response to events can't easily be made order-agnostic.
When the workload doesn't naturally fit the trio, the framework loses parallelism (must run scan-steps serially to preserve order), retry-safety (must dedupe at the framework layer), or both.
Seen in¶
- sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat (2026-05-14, Atlassian) — first canonical wiki home for the explicit trio as a documented framework contract. Names all three properties verbatim, with the rationale that the orchestrator becomes "free to parallelise and retry without coordination." Canonical use case: streaming Jira entities (work items, screens, fields) in batches for usage-report computation in the Pre-computation Framework.
Related¶
- systems/atlassian-precomputation-framework
- concepts/idempotent-operations — the idempotence property in isolation.
- patterns/asynchronous-precomputed-report-batch-framework
- systems/temporal — durable workflow engine where activity workers operate under similar discipline.