Skip to content

CONCEPT Cited by 1 source

Idempotent build action

Definition

A build action is idempotent if, given the same declared inputs, it always produces the same output. It's a necessary condition for sound caching: if running the same action twice can produce two different outputs, caching the result of the first run and serving it for the second is incorrect.

Idempotence sits alongside hermeticity (concepts/hermetic-build) as the two preconditions every modern build system quietly assumes:

  • Hermetic: the action has only the inputs it declares (no hidden reads from disk, env, or network).
  • Idempotent: given those declared inputs, the output is the same every run (no time-dependent output, no randomness, no writes to global state).

Why idempotence is the harder precondition in practice

Hermeticity can be enforced with sandboxing (Bazel does this). Idempotence cannot be enforced by the build system — it's a property of the action's commands themselves. Canonical violations:

  • Timestamps in the output (e.g. a compiler that embeds build- time). Same inputs, different outputs across runs.
  • Non-deterministic ordering (hash-map iteration order, goroutine scheduling) changes the binary's byte-for-byte layout.
  • Network fetches inside the build step (e.g. downloading a dependency) whose resolution can change over time.
  • State mutation in the working directory that the action doesn't clean up, polluting subsequent runs.

Slack's pre-refactor Quip/Canvas build had this problem explicitly:

Build execution units were huge, not all were idempotent, and hermeticity was a challenge because many build steps mutated the working directory.

— Slack, Build better software to build software better

The consequence: even if Bazel's sandboxing could have isolated inputs, the actions themselves produced different outputs each run, so cache entries were unreliable.

The contract with Bazel

Slack's articulation of the build-author's responsibility:

We have to make Bazel a promise: our build steps' cmds are in fact idempotent and hermetic.

The constraints of Starlark help defend this promise — no filesystem I/O, no mutable globals, bounded recursion — but ultimately the idempotence of the shell commands invoked from the rule's cmd is an action-author responsibility.

Defensive techniques

  • Fix the clock. Build-time reproducibility projects (like reproducible-builds.org) often patch compilers to embed a constant SOURCE_DATE_EPOCH instead of current time.
  • Sort everything. Any iteration whose order leaks into output should go through explicit sort.
  • Pin dependencies. Fetch-during-build is acceptable only if the fetched content is keyed to a hash, not a version tag that can move.
  • Clean the working directory. Or, in Bazel's case, use sandboxing to prevent any mutation from surviving the action.
  • Test idempotence. Build twice in clean sandboxes, diff the outputs byte-for-byte; any diff is a bug. Slack wrote a Rust diff tool for exactly this during their build refactor — see patterns/diff-artifact-validator-for-build-refactor.

Seen in

Last updated · 470 distilled / 1,213 read