PATTERN Cited by 1 source
Pipeline step consolidation¶
When "more steps" becomes a tax instead of a speedup, group related hermetic build/test actions into fewer, larger CI steps so the per-step overhead (VM warm-up, cache hydration, image pull) amortises over more work. Canva's BE/ML Pipeline v2 went from 45 → 16 steps and cut average build time 49 → 35 min with ~50 % fewer build minutes (Source: sources/2024-12-16-canva-faster-ci-builds).
Intent¶
Horizontal CI scale-out has a sweet spot: too few steps and you
lose parallelism; too many and per-step fixed costs dominate.
At Canva's scale (>1000 check-merge builds/day, ~3000 jobs per
build, each job = 1 EC2 instance), the per-step VM warm-up tax
was eating the parallelism gain.
The pattern: once hermetic caching (concepts/hermetic-build + concepts/content-addressed-caching) means re-running a step is cheap on cache hits, the right step granularity shifts dramatically — from "one step per target" toward "one step per meaningful grouping", because Bazel inside the step can do its own parallelism and caching.
Mechanism¶
- Identify cache-friendly groupings. If two actions are always run together and share most inputs, one step with two Bazel targets is almost always cheaper than two steps.
- Preserve the critical-path distinctions. Don't merge steps that should report independently for critical-path diagnosis. Canva kept BE/ML, FE, and other pipelines separate for this reason.
- Amortise per-step fixed costs.
- VM cold-start
- Docker image pull
- Bazel JVM warm-up (minutes with a 900K-node graph)
- Remote cache handshake
- Git checkout
- Trust the inner parallelism. Bazel parallelises across targets within a step automatically; consolidation doesn't lose parallel execution, just parallel provisioning.
Canva's measurements¶
BE/ML Pipeline v2 (Apr 2023):
- Steps: 45 → 16 (-64 %)
- Average build time: 49 min → 35 min (-29 %)
- Build minutes: ~-50 %
FE integration tests (bazelified + consolidated, Sep 2023):
- Previously: ~100 jobs/commit, each its own VM.
- After: ~8 jobs/commit (grouped).
- Effect: ~1.3 M jobs/month removed — eliminating their warm-up and scheduling tax.
FE accessibility tests (bazelified + consolidated):
- Previously: 67,145 jobs / 13,947 commits.
- After: ~1 job/commit.
- Expected ~80 % time cut + ~$100K/yr savings.
Preconditions¶
- Hermetic actions inside the grouped step. Without hermeticity, grouping amplifies flakes and state leaks — the original reason Canva had one-VM-per-step.
- Working remote cache. The benefit depends on cache hits within the step being common.
- Inner build system with native parallelism. Bazel is the canonical case. Without it, consolidation loses parallel execution.
Trade-offs¶
- Observability granularity drops. Per-step logs / metrics / UX show "this whole group passed / failed" instead of one signal per target. Engineers have to learn the finer-grained shape sits inside the step.
- Reporting UX changes. Canva called out this risk before rollout — broke some downstream deps (observability tools from other teams) that had been using per-step signals.
- Blast radius is larger per step. A crash in a grouped step takes all its work down together.
Composes with¶
- patterns/build-without-the-bytes — BwoB reduces the bytes shipped into each step; consolidation amortises remaining per-step overhead.
- patterns/static-pipeline-generation — fewer steps makes static generation cheaper and the YAML simpler.
Related¶
- concepts/critical-path — what consolidation aims to reduce.
- concepts/hermetic-build — the precondition.
- concepts/content-addressed-caching — the economic engine.
- patterns/build-without-the-bytes — complementary.
- patterns/static-pipeline-generation — complementary.
Seen in¶
- sources/2024-12-16-canva-faster-ci-builds — BE/ML v2 (45→16 steps, 49→35 min, ~50 % build-minutes cut); FE integ test bazelification (~100→8 jobs/commit, ~1.3 M jobs/month removed); FE a11y bazelification (67K→~14K jobs).