Skip to content

PATTERN Cited by 1 source

CI pipeline as customer-authored durable workflow

Pattern

Treat the CI/CD pipeline as a customer artefact, not a platform artefact: each repo ships its own CIPipeline extends WorkflowEntrypoint class in .cloudflare/ci.ts (or the platform-specific equivalent), the platform's webhook dispatcher loads that repo's pipeline class on demand as a sandboxed dynamic worker, and hands execution to a durable-workflow engine. The platform ingests the webhook, figures out which repo it came from, and runs a durable function that happens to live in the customer's repo. "The platform doesn't know what's in the pipeline. It doesn't need to." (Source: Cloudflare Dynamic Workflows.)

The canonical minimal form

// .cloudflare/ci.ts  (lives in the customer's repo)
import { WorkflowEntrypoint } from 'cloudflare:workers';

export class CIPipeline extends WorkflowEntrypoint {
  async run(event, step) {
    const { repo, sha, branch, pr } = event.payload;

    // Fork an isolated copy of the repo at this commit.
    const workspace = await step.do('checkout', () =>
      this.env.ARTIFACTS.fork(repo, { sha })
    );

    await step.do('install', () =>
      runInSandbox(workspace, ['pnpm', 'install']));

    const [lint, test, build] = await Promise.all([
      step.do('lint',  () => runInSandbox(workspace, ['pnpm', 'lint'])),
      step.do('test',  () => runInSandbox(workspace, ['pnpm', 'test'])),
      step.do('build', () => runInSandbox(workspace, ['pnpm', 'build'])),
    ]);

    if (pr) {
      await step.do('comment', () =>
        this.env.GITHUB.commentOnPR(repo, pr, summarise({ lint, test, build })));
    }

    if (branch === 'main') {
      await step.waitForEvent('approval', {
        type: 'deploy-approval',
        timeout: '24 hours'
      });
      await step.do('deploy', () =>
        runInSandbox(workspace, ['pnpm', 'deploy']));
    }
  }
}

The pipeline is a real WorkflowEntrypoint; every step.do(...) is independently retryable; step.waitForEvent(...) hibernates the pipeline waiting for human approval (no VM held open); state and progress survive deploys, evictions, and crashes.

When to apply

  • Each repo has its own pipeline shape (steps, matrix, integrations) rather than a platform-fixed DAG.
  • The pipeline is long-running — multi-minute builds, multi-stage deploys, approval gates that pause for hours.
  • You want to avoid per-repo VM provisioning cost; the pipeline's idle time should cost nothing.
  • You want failure survival without re-running the whole pipeline from scratch.
  • The platform has a capability-based sandbox substrate it can dispatch into (Dynamic Workers, Firecracker µVMs, etc.).

Stack

Cloudflare's realisation assembles four primitives:

Stage Primitive Role
Workspace (checkout) Artifacts + ArtifactFS Git-native versioned filesystem on Cloudflare's globally distributed network. Lazy hydration → multi-GB repo ready in single-digit seconds. fork() gives each CI run its own isolated copy, no git clone tax.
Lightweight steps Dynamic Workers Each lint / format / typecheck / bundle step runs in a sandboxed V8 isolate that boots in milliseconds, on the same machine as the repo data. No VM provisioning, no image pull, no cold start.
Durable orchestration Dynamic Workflows Holds the whole run together. Steps are retryable and durable. The run hibernates for free while waiting on approvals. State and progress survive deploys, evictions, and crashes.
Heavy corners Sandboxes docker build, integration suites that need Postgres, Rust compiles that want 8 cores. Snapshot-restore to R2 means even these warm-start in a couple of seconds.

Traditional CI vs. this pattern

Cloudflare contrasts a mid-sized JS repo's CI run on a traditional VM-based platform vs. on this stack:

Phase Traditional VM-based CI Dynamic-workflow-based CI
Allocate compute 15-30s VM boot single-digit-ms isolate boot
Base image ~10s pull no image pull
Source checkout ~10s git clone single-digit-s edge fork
Dependency install 30-60s npm ci lifted into durable step, hibernates if idle
Test execution actual work actual work
Teardown full VM tear-down isolate evicted when idle
Idle cost paid for whole VM time ≈ 0

"The repo doesn't move — the compute comes to it." Qualitative framing; no p50/p99 disclosure in the source post.

Composes with

Trade-offs

Upsides:

  • Repos iterate on their pipeline at their own cadence; no platform-blocking wait for platform-level steps to ship.
  • Idle repos cost ≈ 0; active repos share the isolate fleet.
  • Failure survival: a crashed worker mid-build doesn't lose completed steps. The workflow resumes at the failed step.
  • Approval gates via step.waitForEvent('approval') don't burn VM hours.
  • Repos get their own pipeline without the platform having to understand what's in it.

Downsides / open questions:

  • Debuggability: a tenant debugging a stuck pipeline needs a stack that spans platform webhook → dispatcher → step.do() boundary → their runInSandbox call → pnpm/docker/cargo. Traditional tools assume one process.
  • Pipeline determinism during redeploy: if the customer pushes a new .cloudflare/ci.ts while a previous pipeline is still paused on waitForEvent, does the resumed run use the old or new pipeline code? Cloudflare doesn't yet disclose.
  • Platform-level gates are harder: organization-wide deploy-approval policies that must apply to all repos now need to be enforced at the dispatcher layer, not in the pipeline itself (which the customer owns).
  • Observability cardinality: per-repo pipeline logs, metrics, traces all key on repo ID — the platform's observability substrate must scale to per-repo cardinality.

Contrasts with

  • systems/github-actions / systems/bitbucket-pipelines model: these platforms do support per-repo pipeline YAML, but the pipeline body is a static DAG that the platform interprets and then dispatches to VMs. The shape in this pattern is a real durable function the customer authored in TypeScript; the platform doesn't interpret it — it just runs it.
  • systems/bitbucket-merge-queues merge-queue pipeline: Atlassian's dedicated merge-queue pipeline is statically-bound-per-repo (configured in bitbucket-pipelines.yml). Dynamic-workflow CI could implement the same semantics as one of several per-repo pipeline variants the customer ships in their own code.
  • Container-per-PR CI (e.g. Fireworks-style Firecracker µVMs per workload): these give isolation but at µVM-sized cost and boot time (~125 ms). Isolate-per-step gets the cost floor to single-digit-ms boot and a few MB of memory.

Seen in

Last updated · 438 distilled / 1,268 read