Skip to content

PATTERN Cited by 1 source

Tests as executable specifications

Pattern

Treat the test suite not just as a regression net, but as the behavioral specification of the system — a corpus of executable assertions that both human reviewers and AI agents read to infer what the system is supposed to do. When a test fails, the failure itself is a teaching signal: the agent reads the assertion, the inputs, the expected output, and can often refine its change without re-prompting.

Named by the 2026-03-26 AWS Architecture Blog post:

"In agentic workflows, tests do more than catch regressions, they define acceptable behavior."

"Well-written tests also act as documentation. When a test fails, the agent can infer what behavior is expected and refine its changes accordingly."

Why it works for agents

  • Tests are formally structured. Name, arrange, act, assert — the agent can parse structure in a way it can't parse a prose docstring.
  • Failure output is diagnostic. "Expected X, got Y" is a machine-readable hint; the agent uses it to steer the next edit.
  • Scope is implicit. A test suite's boundaries tell the agent what's in-contract and what isn't.
  • Specs stay executable, not wishful. Prose docs rot; tests that rot fail CI.

Test-quality properties that make this work

  • Clear namesshouldRejectNegativeAmounts teaches more than test123.
  • One assertion per test — failure tells the agent which property broke.
  • Realistic fixtures — not so stripped-down the test teaches a wrong model of the domain.
  • Tight arrange/act/assert separation — the agent can read the inputs, action, and oracle independently.

Sibling patterns in the wiki

Pairs with

Caveats

  • Bad tests teach wrong behavior. If tests lie (flaky, wrong assertions, inverted expectations), the agent learns the lie. Test quality is load-bearing under this pattern.
  • Coverage gaps are invisible teachers. Absent tests look like "no spec"; the agent may invent plausible-but-wrong behavior. Pair with concepts/specification-driven-development for the formal-spec tier.
  • No measurement in the 2026-03-26 source of how much agent behavior improves under well-written tests vs poorly-written ones.

Seen in

Last updated · 200 distilled / 1,178 read