Skip to content

PATTERN Cited by 1 source

CI as agent quality gate

Intent

The AI agent is inside the CI loop: when the CI pipeline runs on its PR, the agent reads the pipeline output, identifies failures, and addresses them autonomously before requesting human review. CI is not a gate humans enforce — it's a feedback channel the agent iterates against.

Canonical articulation — Atlassian Fireworks, 2026-04-24:

"CI pipeline as quality gate: Every PR runs lint, vet, tests, and Helm validation. The agent reads pipeline output and addresses failures before requesting review." (Source: sources/2026-04-24-atlassian-rovo-dev-driven-development)

Shape

   Agent pushes PR
   [CI pipeline] ─── lint
                 ├── vet
                 ├── tests
                 └── Helm validation
   Pipeline output ──► read by agent
          │                │
          │ fail            │ pass
          ▼                ▼
   Agent patches code  Request human review
          └─── loop ───┘

The pipeline is unchanged from a traditional CI setup; what changes is the consumer of the pipeline output — the agent, not a human.

Why this matters

Without the agent reading pipeline output, you get one of two degenerate patterns:

  • Human-in-the-loop for every failure. Every pipeline failure bounces back to the human, who has to diagnose it, feed the diagnosis back to the agent, re-run, wait, review output. Human review latency contaminates every CI failure.
  • Agent doesn't wait for CI. Agent pushes PRs assuming CI passes; when CI fails, the PR sits until a human notices. Blocks PR-to-merge latency.

Putting the agent inside the CI loop closes both gaps: CI failures become agent-iteration inputs, not human-intervention triggers.

Tool contract — pipeline output must be agent-readable

The pattern requires the CI surface to expose its output in a form the agent can consume:

  • Structured output preferred. Linter findings with file/line/message. Test failures with stack traces. Helm-validation errors with resource references.
  • Integrated agent access. The Fireworks team uses Rovo Dev with first-party Bitbucket Pipelines integration — "the agent can raise PRs, read diffs, and monitor builds without leaving the conversation." See systems/bitbucket-pipelines.
  • Low latency. The agent's iteration rhythm depends on CI turnaround — minutes, not hours.

What CI catches vs. what the adversarial reviewer catches

CI covers repeatable mechanical checks; the adversarial reviewer covers judgement-level bugs:

CI catches Adversarial sub-agent catches
Lint / style / formatting Subtle logic bugs
Static analysis (vet) Missing edge cases
Existing test failures Hallucinated invariants
Config / Helm validity Tautological tests

Both tiers are necessary; neither subsumes the other. See patterns/pre-human-agent-review for the three-tier stack.

Guardrail — CI is declarative, agent doesn't edit it

One important safety property: the CI pipeline itself is not the agent's property. The agent reads pipeline output; it does not modify the pipeline definition to make failures go away. (If the agent can edit bitbucket-pipelines.yml to skip a failing check, the gate stops being a gate.) Treat the CI config as a protected surface — human-controlled, versioned, reviewed separately.

When it fits

  • Agent-driven development. The pattern assumes the agent is the primary author of the code and the tests.
  • CI has structured output. The agent needs parseable feedback. Plain-text logs work less well than structured findings.
  • CI latency is minutes. Slower CI breaks the iteration rhythm; the agent ends up context-switched-out by the time the pipeline returns.

When it doesn't fit

  • CI failures require physical access / hardware. Some tests (embedded, hardware-in-the-loop) can't be iterated on by the agent alone.
  • Non-deterministic CI. If CI fails randomly, the agent's loop gets noisy — it tries to fix things that aren't broken. Non-determinism has to be handled at the CI level first.

Composition

Seen in

Last updated · 510 distilled / 1,221 read