Skip to content

CONCEPT Cited by 1 source

Idempotent thread-safe order-agnostic scan-step

Definition

An idempotent thread-safe order-agnostic scan-step is the load-bearing worker contract that batch-processing frameworks require of their per-batch workers, so that the framework can parallelise and retry batches without coordination. The trio of properties is:

  • Idempotent — running the same batch twice produces the same final state. Retries are safe.
  • Thread-safe — multiple batches can run concurrently without corrupting shared state. Parallelism is safe.
  • Order-agnostic — batches can arrive in any order without affecting the final report. Reordering for backpressure or retry is safe.

The 2026-05-14 Atlassian post is the first canonical wiki home for the trio as an explicit framework contract. Verbatim:

"Scan-steps – the framework streams Jira entities (work items, screens, fields, and so on) in batches. Tool implementations must be thread-safe, idempotent, and order-agnostic, so the framework is free to parallelise and retry without coordination."

(Source: sources/2026-05-14-atlassian-optimisation-tools-for-jira-reducing-configuration-bloat)

What each property buys the framework

Property What it enables What violations cost
Thread-safe Free parallel execution of scan-steps across batches Locks, serial throughput, data races
Idempotent Free retry on transient failure without coordination Duplicate-detection state, retry locks
Order-agnostic Free batch interleaving / reordering for backpressure Sequence-tracking state, replay buffers

When workers satisfy the trio, the orchestrator can treat each batch as an independent retryable unit and coordinate nothing beyond batch dispatch. When workers don't, the orchestrator must add coordination — at-most-once dispatch, order-preserving queues, distributed locks — each of which adds latency and reduces throughput.

How the three properties interact

The trio is not redundant. Each addresses a distinct class of failure:

  • Thread-safe alone isn't enough — without idempotence, a worker that crashes mid-batch and is retried can double-count or partially-update state.
  • Idempotent alone isn't enough — without thread-safety, two concurrent batches can interleave updates such that the combined effect isn't equivalent to running each once.
  • Idempotent + thread-safe alone isn't enough — without order-agnosticism, a worker that depends on earlier-batch state requires sequence-preserving delivery, defeating parallelism.

A worker satisfying all three is essentially a pure function over its batch input, modulo monotonic side-effects (e.g. counter increment by exactly the input batch's contribution, or upsert-by-key with a deterministic value).

Adjacent contracts at other altitudes

The same trio (or close approximations) appears in:

  • MapReduce mappers — pure-function mappers operating on independent input splits; the idempotent + thread-safe + order-agnostic discipline is implicit in the framework contract.
  • Stream-processing operators (Flink, Kafka Streams) — operators that satisfy the trio enable exactly-once semantics with at-least-once delivery, by letting the framework retry without coordination.
  • Idempotent HTTP handlers — RFC-7231 idempotent methods (PUT, DELETE) are the network-protocol-altitude analogue.
  • Workflow activities in Temporal / Cadence — activity workers are explicitly required to be retry-safe; thread-safe and order-agnostic apply when activities run concurrently.
  • CRDT operations — convergent replicated data types satisfy the trio at the data-structure altitude: operations are commutative (order-agnostic), idempotent (retry-safe), and concurrent-safe (thread-safe). The scan-step contract is the batch-processing-shaped cousin of CRDT operations.

Implementation discipline

To satisfy the trio, scan-step authors typically:

  • Avoid worker-local mutable state across batches — any state worth keeping is written to durable storage with deterministic upsert semantics.
  • Use upsert-by-key, not unconditional insert — duplicate batches don't produce duplicate rows.
  • Avoid non-deterministic computations — random numbers, timestamps captured during processing, external API calls without idempotency keys all break retry-determinism.
  • Use commutative aggregations — counter += batch_size, set-add, max(), min() — operations whose result doesn't depend on the order they're applied.
  • Surface per-batch effect as a deterministic function of (batch_input, scope_id) — the same (input, scope) pair always produces the same effect, regardless of how many times or in what order it runs.

Trade-offs

The discipline isn't free:

  • Some computations don't fit naturally — anything inherently sequential (e.g. "the i-th batch's output is the (i-1)-th batch's input") can't be made order-agnostic without extra state.
  • Idempotence requires deduplication keys — for inserts, this means stable IDs (often (scope_id, entity_id) composites); generating these can be non-trivial.
  • Order-agnosticism forbids cumulative state machines — a scan-step that updates a state-machine in response to events can't easily be made order-agnostic.

When the workload doesn't naturally fit the trio, the framework loses parallelism (must run scan-steps serially to preserve order), retry-safety (must dedupe at the framework layer), or both.

Seen in

Last updated · 542 distilled / 1,571 read