Skip to content

CONCEPT Cited by 1 source

Content-addressed caching (of build/test actions)

Definition

Content-addressed caching stores action outputs keyed by a hash of the action's inputs (source contents, tool versions, environment, dependencies, flags). Two runs with the same inputs produce the same key; the second run is a cache hit and skips execution entirely.

This is the payoff of concepts/hermetic-build: once every input is declared, the input hash fully defines the output, so the output can be cached, shared, and even downloaded instead of rebuilt.

Bazel's flavor

Bazel's remote cache is the canonical industrial example. Every build action has:

  • Input digest — hashes of source files + tool binaries + dependency outputs (transitively).
  • Action digest — input digest + command line + environment.
  • Output digest — written at end of action.

Before executing an action Bazel checks lookup(action_digest) against one or more cache layers (local disk, remote cache like bazel-remote, RBE CAS). Hit → download the output (or, under BwoB, just reserve the digest and continue — see patterns/build-without-the-bytes). Miss → run, store.

At Canva's scale

Canva's cache was bazel-remote installed on every CI instance, backed by an S3 bucket as shared global storage:

gRPC communication between the Bazel server and the cache happens locally, but it's supported by global shared cache storage.

This made the cache a network-I/O-friendly singleton: local lookup speed with global hit rate.

The eviction hazard

Content-addressed caches backed by bounded storage (S3 + GC, LRU, etc.) can evict an action's output after another action started depending on it. BwoB amplifies this: if the artifact wasn't downloaded locally and a later step needs it, a cache miss becomes a build failure. Canva's mitigation: retry on cache-check failure. Works because evictions are rare. (GitHub issue #10880)

Consequences for pipeline design

  • Re-running CI on the same commit is near-free if cache is warm.
  • Incremental changes run only what's downstream of the change. A single-file PR mostly hits cache.
  • Cache-key hygiene matters. A non-hermetic input (e.g. current wall-clock, $USER, a transitively-linked system library) silently invalidates huge cache subtrees — which is why hermeticity enforcement is strict.
  • RBE builds on the same CAS. Remote execution workers put outputs in the same content-addressed store CI clients look up.

Seen in

Last updated · 200 distilled / 1,178 read