CONCEPT Cited by 1 source
AI writes own tests¶
Definition¶
The AI agent writes both the production code and the test suite that validates it — the agent produces the e2e tests, deploys to a real environment, runs the tests, and loops on failures until green. The test suite is the primary correctness proof, not a separate human-authored oracle.
Atlassian's Rovo Dev / Fireworks post is the canonical wiki articulation:
"AI writes the e2e tests too. The agent writes tests, deploys to a dev shard, runs them, and loops on failures until they pass. The test suite is the primary proof that things work." (Source: sources/2026-04-24-atlassian-rovo-dev-driven-development)
Why this is counterintuitive but works¶
The intuitive objection: if the LLM writes the tests, it will write tests that pass for the wrong reason — it will hallucinate both the code and the tests in a self-consistent but wrong way. This is a real risk. The post's stance on it:
"If you're reading any code, read the tests."
Three properties make it work in practice:
- The tests are the specification that gets human-reviewed. The human is not reviewing the production code line-by-line; they are reviewing the tests. The test suite is the specification the human is holding the agent accountable to.
- The tests run in a real environment. e2e tests that deploy to a real dev shard and exercise the real integration surface can't pass on a fabricated integration (the integration doesn't exist to mock). See patterns/dev-shard-iteration-loop.
- The loop converges on passing tests against observable invariants. See concepts/black-box-validation — if the invariant is "boots in 100 ms" or "network policy blocks X", the test can't pass without actually observing that outcome.
The division of labour¶
| Artifact | Who writes it | Who reviews it |
|---|---|---|
| Production code | AI agent | AI adversarial sub-agent, CI quality gate, architecture review |
| e2e test suite | AI agent | Human (primary), AI adversarial sub-agent |
| Observable invariants ("spec") | Human | — |
The human writes nothing the agent could write; the agent writes everything the human specified. The handoff is at the invariant level, not the code level.
Relation to automated test generation from production traffic¶
A Zalando instance ([[concepts/automated-test-generation-from- production-traffic]]) generates tests by capturing real production inputs and replaying them. The Rovo-Dev instance generates tests by prompting an LLM with the spec and asking it to propose tests. Both are instances of "don't hand-write tests," but the input substrate differs:
| Source | Input substrate | Tradeoff |
|---|---|---|
| Production-traffic replay | Real observed inputs | High realism, but only for code paths that exist in prod |
| AI-written e2e tests | LLM proposals from the spec | Covers new features before any prod traffic, but inputs must be validated against the spec |
The two are complementary, not competing — a mature agentic pipeline can use both, with replay covering regression and AI-written tests covering greenfield features.
Failure modes¶
- Tautological tests. Agent writes a test that is effectively
assert(function() == function()), i.e. asserts the code against itself. Human-review of the tests catches this. Spec-level invariants ("boots in 100 ms") don't admit this failure mode — the test can't be tautological if the invariant is observable- externally. - Missing-coverage gaps. Agent writes tests for the happy path but misses error cases. Adversarial review sub-agent (concepts/adversarial-review-persona) is the designed mitigation — "have an adversarial persona subagent that ... reviews what the main agent has written."
- Integration gaps. Unit-level tests only. Mitigated by the e2e / dev-shard requirement — every feature must have tests that run against a real cluster shard, not just a mocked integration.
Seen in¶
- sources/2026-04-24-atlassian-rovo-dev-driven-development — canonical articulation. Production validation for Fireworks comes from AI-written e2e tests running on AI-provisioned dev shards; this is "the primary proof that things work."