Skip to content

CONCEPT Cited by 2 sources

Build graph

Definition

A build graph is the DAG of declared build/test actions and their dependencies. Nodes are actions (compile-this, link-that, run-this-test); edges are "X needs Y's output as input." Modern build systems (Bazel, Buck, Pants, Nix) operate on the build graph explicitly: they load it from declaration files, analyze it, and schedule actions against workers.

Why "the graph" matters

Having the graph as a first-class, analyzable object enables:

  • Incremental builds. Only re-run actions whose inputs (transitively) changed — computed as a graph diff (patterns/build-without-the-bytes complements this).
  • Parallelism. Actions with no edges between them run concurrently, bounded only by worker count. Canva's critical-path analysis is literally "longest path in the graph."
  • Caching. Each node is keyed by its input-subtree digest; a global cache turns the graph into "only compute this node if nobody has ever computed it with these exact inputs" (concepts/content-addressed-caching).
  • Remote execution. Node granularity = unit of distribution.

Scale at Canva

From the retrospective:

Our build graph has more than 10^5 nodes and some expensive build actions (5–10 minutes), which run often… It didn't help that we had a build graph with over 900K nodes.

The graph-load step is itself an action:

Loading and analyzing the build graph to generate the necessary actions from the /WORKSPACE, **/BUILD.bazel, and *.bzl files, plus the filesystem and cache state, might take minutes, especially when you include many targets.

This is why Canva's experiments included bazel build //... (OOMed because the graph was too big to load into JVM), bazel build //... - //web/... (worked), and bounded-scope pipeline generation. The graph's size is itself a first-order constraint.

Diff-against-graph pattern

bazel-diff (Tinder/bazel-diff) computes a per-target input-hash manifest and compares two commits' manifests to produce a "targets that need to run" list. Canva's pipeline v3 pre-computes this manifest on dedicated instances as soon as a commit is pushed and uploads to S3; CI jobs download the manifest at runtime. (See patterns/static-pipeline-generation.)

Seen in

  • sources/2024-12-16-canva-faster-ci-builds — 900K-node graph load time as a first-order cost; bazel-diff as graph-diff-for-incremental-CI.
  • sources/2025-11-06-slack-build-better-software-to-build-software-better — a graph that violates its DAG/hermeticity/granularity preconditions delivers zero cache-hit rate even under Bazel. Slack's pre-refactor Quip/Canvas build was "so interconnected that our cache hit rate was zero. Imagine that every cached 'function' we tried to call had 100 parameters, 2-3 of which always changed." The fix was to sever a Python↔TypeScript dependency edge that was making every Python change invalidate the frontend bundle cache (~35 min/build cost) — a graph- topology fix, not a tool fix.
Last updated · 542 distilled / 1,571 read