Skip to content

SYSTEM Cited by 2 sources

Buildkite

Buildkite is a commercial hosted CI orchestrator: the control plane / UI / job-scheduler is hosted; agents (the workers that actually run steps) are customer-owned — typically EC2 or Kubernetes pods inside the customer's VPC. This split is distinctive: customer controls the execution environment, the hardware shape, and the networking path to internal caches and artifact stores, while Buildkite handles pipeline definition, scheduling, and result reporting.

Pipeline model

  • A pipeline is declared in YAML (typically generated).
  • A build is a pipeline run, triggered by a git push, PR, merge, or upload.
  • A step is one unit of work in a pipeline. Steps can be commands, scripts, blocks (human approval), triggers (another pipeline), or wait/group markers.
  • Steps run on customer-owned agents. Agents are long-running processes that pull jobs by tag/queue matching.

Dynamic vs static pipelines

Buildkite supports pipeline uploads: a step can emit a new pipeline YAML at runtime (buildkite-agent pipeline upload). This is a common source of per-commit pipeline-generation work — but it can also put expensive analyses on the critical path.

Canva's pipeline-v3 work is a concrete case study in moving away from runtime pipeline generation: patterns/static-pipeline-generation pre-computes the pipeline + out-of-band publishes an input-hash manifest, so jobs avoid the >10-min per-commit generation tax.

Use at Canva

From the Canva retrospective Buildkite is positioned as one of the external dependencies:

It has many downstream dependencies: … Some dependencies are outside Canva, such as AWS, Buildkite, GitHub, and internet mirrors (NPM, Maven, PyPI, and so on) …

And Buildkite-provided YAML is the target format of Canva's Starlark-based pipeline generator:

In this new generator, we declare the pipeline configuration in Starlark (Bazel's configuration language), which we convert to YAML, as Buildkite expects.

Canva's agents run on EC2 worker pools Canva manages (i4i.8xlarge, c6id.12xlarge) — Buildkite's split model is why changing instance shape (patterns/instance-shape-right-sizing) and warm-up (patterns/snapshot-based-warmup) are fully customer-side concerns.

Seen in

  • — PlanetScale (2022-01-18) on a different Buildkite use case: Rails test-suite parallelism rather than Canva's build orchestration. Canonical wiki datum for the customer-owned-agent economics that the split model enables — PlanetScale runs the Rails test suite on 64-core agents via parallelize(workers: 64), dropping wall-clock from ~12 min serial to 3-4 min and then to ~1 min after eliminating factory-explosion. "Our infrastructure team set us up with some 64 core machines on Buildkite." The split model is what makes the 64-core shape a customer-side provisioning decision rather than a vendor SKU tier — directly enabling worker-count scaling past what a hosted-runner vendor would offer.
  • sources/2024-12-16-canva-faster-ci-builds — Buildkite named as the external CI orchestrator; Starlark-generated YAML is the target format; Canva-managed EC2 agent pools.
  • sources/2025-01-07-slack-automated-accessibility-testing-at-slack — Slack uses Buildkite as the scheduled-regression substrate for its Axe accessibility test suite: a daily Buildkite pipeline run (triggered outside PR gating) with A11Y_ENABLE=true pipes violation output into a Slack alert channel, which in turn triggers a Jira auto-ticket workflow. Canonical wiki datum for Buildkite as the scheduled half of a tri-mode opt-in execution pattern (on-demand local + scheduled Buildkite
  • opt-in CI gate). The customer-owned-agent shape matters here too — Slack can run the full a11y regression on its own agent fleet nightly without paying per-run hosted-runner costs.
Last updated · 542 distilled / 1,571 read