CONCEPT Cited by 1 source
Content-addressed caching (of build/test actions)¶
Definition¶
Content-addressed caching stores action outputs keyed by a hash of the action's inputs (source contents, tool versions, environment, dependencies, flags). Two runs with the same inputs produce the same key; the second run is a cache hit and skips execution entirely.
This is the payoff of concepts/hermetic-build: once every input is declared, the input hash fully defines the output, so the output can be cached, shared, and even downloaded instead of rebuilt.
Bazel's flavor¶
Bazel's remote cache is the canonical industrial example. Every build action has:
- Input digest — hashes of source files + tool binaries + dependency outputs (transitively).
- Action digest — input digest + command line + environment.
- Output digest — written at end of action.
Before executing an action Bazel checks lookup(action_digest) against one
or more cache layers (local disk, remote cache like bazel-remote, RBE CAS).
Hit → download the output (or, under BwoB, just reserve the digest and
continue — see patterns/build-without-the-bytes). Miss → run, store.
At Canva's scale¶
Canva's cache was bazel-remote installed on every CI instance, backed by an
S3 bucket as shared global storage:
gRPC communication between the Bazel server and the cache happens locally, but it's supported by global shared cache storage.
This made the cache a network-I/O-friendly singleton: local lookup speed with global hit rate.
The eviction hazard¶
Content-addressed caches backed by bounded storage (S3 + GC, LRU, etc.) can evict an action's output after another action started depending on it. BwoB amplifies this: if the artifact wasn't downloaded locally and a later step needs it, a cache miss becomes a build failure. Canva's mitigation: retry on cache-check failure. Works because evictions are rare. (GitHub issue #10880)
Consequences for pipeline design¶
- Re-running CI on the same commit is near-free if cache is warm.
- Incremental changes run only what's downstream of the change. A single-file PR mostly hits cache.
- Cache-key hygiene matters. A non-hermetic input (e.g. current wall-clock,
$USER, a transitively-linked system library) silently invalidates huge cache subtrees — which is why hermeticity enforcement is strict. - RBE builds on the same CAS. Remote execution workers put outputs in the same content-addressed store CI clients look up.
Related¶
- concepts/hermetic-build — the precondition.
- patterns/build-without-the-bytes — skip downloading cache artifacts unless a downstream action needs them.
- concepts/remote-build-execution — distributed execution sharing the same CAS.
- systems/bazel, systems/bazel-remote.
Seen in¶
- sources/2024-12-16-canva-faster-ci-builds — bazel-remote + S3 as shared CI cache; retry-on-eviction as a BwoB-compatibility fix.