SYSTEM Cited by 2 sources
Buildkite¶
Buildkite is a commercial hosted CI orchestrator: the control plane / UI / job-scheduler is hosted; agents (the workers that actually run steps) are customer-owned — typically EC2 or Kubernetes pods inside the customer's VPC. This split is distinctive: customer controls the execution environment, the hardware shape, and the networking path to internal caches and artifact stores, while Buildkite handles pipeline definition, scheduling, and result reporting.
Pipeline model¶
- A pipeline is declared in YAML (typically generated).
- A build is a pipeline run, triggered by a git push, PR, merge, or upload.
- A step is one unit of work in a pipeline. Steps can be commands, scripts, blocks (human approval), triggers (another pipeline), or wait/group markers.
- Steps run on customer-owned agents. Agents are long-running processes that pull jobs by tag/queue matching.
Dynamic vs static pipelines¶
Buildkite supports pipeline uploads: a step can emit a new
pipeline YAML at runtime (buildkite-agent pipeline upload).
This is a common source of per-commit pipeline-generation work —
but it can also put expensive analyses on the critical path.
Canva's pipeline-v3 work is a concrete case study in moving away from runtime pipeline generation: patterns/static-pipeline-generation pre-computes the pipeline + out-of-band publishes an input-hash manifest, so jobs avoid the >10-min per-commit generation tax.
Use at Canva¶
From the Canva retrospective Buildkite is positioned as one of the external dependencies:
It has many downstream dependencies: … Some dependencies are outside Canva, such as AWS, Buildkite, GitHub, and internet mirrors (NPM, Maven, PyPI, and so on) …
And Buildkite-provided YAML is the target format of Canva's Starlark-based pipeline generator:
In this new generator, we declare the pipeline configuration in Starlark (Bazel's configuration language), which we convert to YAML, as Buildkite expects.
Canva's agents run on EC2 worker pools Canva manages (i4i.8xlarge, c6id.12xlarge) — Buildkite's split model is why changing instance shape (patterns/instance-shape-right-sizing) and warm-up (patterns/snapshot-based-warmup) are fully customer-side concerns.
Related¶
- systems/bazel — Canva's Starlark generator targets Buildkite YAML; Bazel provides the inner parallelism inside each Buildkite step after patterns/pipeline-step-consolidation.
- systems/aws-ec2 — Canva's agent compute.
- concepts/critical-path — the metric Buildkite pipeline shape directly affects.
- patterns/static-pipeline-generation — Canva's move away from per-commit Buildkite pipeline uploads.
- patterns/pipeline-step-consolidation — shape of the steps inside the Buildkite pipeline.
Seen in¶
- — PlanetScale (2022-01-18) on a different Buildkite use case:
Rails test-suite parallelism rather than Canva's build
orchestration. Canonical wiki datum for the
customer-owned-agent economics that the split model
enables — PlanetScale runs the Rails test suite on
64-core agents via
parallelize(workers: 64), dropping wall-clock from ~12 min serial to 3-4 min and then to ~1 min after eliminating factory-explosion. "Our infrastructure team set us up with some 64 core machines on Buildkite." The split model is what makes the 64-core shape a customer-side provisioning decision rather than a vendor SKU tier — directly enabling worker-count scaling past what a hosted-runner vendor would offer. - sources/2024-12-16-canva-faster-ci-builds — Buildkite named as the external CI orchestrator; Starlark-generated YAML is the target format; Canva-managed EC2 agent pools.
- sources/2025-01-07-slack-automated-accessibility-testing-at-slack
— Slack uses Buildkite as the scheduled-regression
substrate for its Axe accessibility
test suite: a daily Buildkite pipeline run (triggered outside
PR gating) with
A11Y_ENABLE=truepipes violation output into a Slack alert channel, which in turn triggers a Jira auto-ticket workflow. Canonical wiki datum for Buildkite as the scheduled half of a tri-mode opt-in execution pattern (on-demand local + scheduled Buildkite - opt-in CI gate). The customer-owned-agent shape matters here too — Slack can run the full a11y regression on its own agent fleet nightly without paying per-run hosted-runner costs.