PATTERN Cited by 1 source

Diff-artifact validator for build refactor¶

Context¶

Migrating a large, tangled build system to a new one (e.g. moving a multi-language pipeline onto Bazel) is a high-risk, long-duration project. The core problem is that build code usually has no tests — its "correctness spec" is effectively "whatever the current build produces on a given commit".

Without a mechanical oracle for correctness, every incremental change during the migration risks subtly breaking the build, and the error only surfaces later — in CI, in prod, or as a debugging session for an unlucky engineer.

Problem¶

From Slack's account of their Quip/Canvas Bazel migration:

The complexity of the original build code made it challenging to define "correct" behavior. Our build code mostly did not have tests. The only criterion for what was correct was what the existing build system produced under a specific configuration.

— Slack, Build better software to build software better

The team needed a way to know, at each step of the rewrite, whether the new build was producing artifacts equivalent to the old. Manual inspection of megabytes of binary output is hopeless; ad-hoc spot-checking misses regressions; integration tests take too long to run during a tight edit-compile-diff loop.

Solution¶

Build a byte-diff validator that compares an artifact produced by the existing (old) build system against the same artifact produced by the new system, and reports mismatches. Use it as the iteration oracle during the migration.

From Slack:

To reassure ourselves, and to instill confidence in our engineers, we built a tool in Rust to compare an artifact produced by the existing process with one produced by our new code. We used the differences to guide us to points where our new logic wasn't quite right, and iterated, and iterated more.

The validator's shape:

Run the old build on a fixed input set (commit hash + build config), capture all output artifacts.
Run the new build on the same input, capture all outputs.
Diff: byte-for-byte, or (for formats with benign variation like timestamps or hash-map order) with format-aware normalisation.
Report: highlight mismatches by file, by action, by phase.
Iterate: fix the new build, re-run the validator, repeat until diff is empty.

Why a Rust tool¶

Slack chose Rust, which is sensible for a few reasons:

Speed: on a monorepo with thousands of output artifacts, byte- diffing can itself be slow. Rust's performance is well-matched.
Determinism: the validator should not add its own non-determinism (no hash-map iteration issues in output ordering).
Portability: a single binary runs on any CI worker or local dev machine.

But any reasonably-fast systems language works — Go, C++, C. The language choice is secondary to the validator's existence.

Format-aware normalisation¶

Pure byte-for-byte diffing fails if the artifact contains benign variation that isn't a bug:

Embedded timestamps in binaries, JARs, images — see concepts/idempotent-build-action for why these are broken anyway, but for a migration, normalise them before diffing.
Hash-ordered dict iteration in serialised outputs — sort before comparing.
Minifier whitespace differences if the two builds use different minifier versions — parse and compare the AST.

The validator should either (a) pre-process artifacts to eliminate benign variation, or (b) use format-aware diffing (AST diff for JS, ELF diff for binaries, DWARF-aware diff for debug info).

Outcomes¶

Slack reports the validator was central to the migration's confidence curve:

We used the differences to guide us to points where our new logic wasn't quite right, and iterated, and iterated more.

The validator gave the team:

A mechanical correctness oracle when no test suite existed.
A progress metric (how many artifacts still diff?).
An engineer-assurance signal — stakeholders could verify the new build was correct without reading any Starlark code.

Prerequisites¶

Both build systems (old and new) can be run side-by-side on the same inputs, producing separate output trees.
The output artifacts are deterministic enough that a diff is meaningful (i.e. the old build is reasonably idempotent — see concepts/idempotent-build-action).
Sufficient CI / local compute to run both builds during development.

Variations¶

Continuous diff: run the validator in CI on every commit so regressions are caught as soon as they're introduced.
Subset diff: for large monorepos, diff only the affected artifacts (via build-graph diff) to iterate faster.
Differential fuzzing: extend beyond one fixed input set to randomly-generated inputs, à la patterns/mysql-compatible-differential-fuzzing. For a build system this is less useful than a fixed commit set, because build correctness is per-repo not per-input-distribution.

patterns/parallel-rewrite-with-differential-testing — same structural idea at the application-code rewrite altitude.
patterns/fuzz-ast-vs-vm-oracle — SQL-execution-engine variant of the same principle: compare two implementations against each other instead of against a spec.

concepts/build-graph — identifies which artifacts to diff.
concepts/idempotent-build-action — precondition for the diff to be meaningful; non-idempotent builds produce spurious diffs.
systems/bazel — the target build system in Slack's case.

Seen in¶

sources/2025-11-06-slack-build-better-software-to-build-software-better — Slack builds a Rust byte-diff tool to validate their new Starlark-based Bazel build against the incumbent Python- orchestrated build, using the diff output to iteratively fix the new build until artifacts match. Canonical articulation of the pattern for a multi-month build-system migration.