PATTERN Cited by 1 source
Test the ambiguous invariant¶
Test the ambiguous invariant is the discipline of writing automated tests for behaviour that your code-base de-facto relies on even when the relevant spec does not formally require it. The pattern applies wherever a standard is silent or advisory, but downstream consumers have hardened around one reading of it — so "what the spec allows" and "what you can safely ship" diverge.
When the pattern applies¶
The warning signs:
- The spec uses non-normative words ("preface", "typically", "normally") — see RFC normative language.
- A meaningful fraction of deployed consumers depend on one reading — see backward compatibility's long-tail-of-clients discipline.
- Your implementation has historically produced one behaviour, but the code doesn't enforce it — so a refactor can silently produce the other behaviour.
- You cannot reach the broken clients to update them (deployed firmware, embedded devices, old libraries).
All four conditions were present for Cloudflare's
systems/cloudflare-1-1-1-1-resolver|1.1.1.1 CNAME-ordering in
its DNS responses: RFC 1034 uses the word "preface"
(non-normative); glibc getaddrinfo + Cisco Catalyst DNSC
depend on CNAME-first; Cloudflare's code produced CNAME-first but
didn't enforce it; and the broken clients include Linux userspace
and switch firmware that will never be updated.
The mechanism¶
Write a test that asserts the invariant at the boundary — the wire format, API shape, or data layout downstream consumers see — regardless of how your internal code evolves. Examples:
- DNS resolver: assert that A/AAAA records in a response never appear before the CNAMEs that alias them.
- HTTP server: assert that
Content-Lengthis emitted beforeTransfer-Encodingif both are present (for HTTP/1.1 clients that parse headers sequentially). - File-format writer: assert that a deprecated magic prefix still appears at byte 0, because old parsers expect it.
- API response: assert that a field that was once nullable is
always non-null now, because older SDKs don't handle
null.
The test must be at the output surface, not at an internal function — the whole point is to catch refactors that preserve internal semantics but change external shape.
Case study: 2026-01-08 1.1.1.1 incident¶
From Cloudflare's post-mortem:
In our case, we did originally implement the specification so that CNAMEs appear first. However, we did not have any tests asserting the behavior remains consistent due to the ambiguous language in the RFC.
The 2025-12-02 memory-optimisation refactor to
PartialChain::fill_cache changed
let mut answer_rrs = Vec::with_capacity(entry.answer.len() + self.records.len());
answer_rrs.extend_from_slice(&self.records); // CNAMEs first
answer_rrs.extend_from_slice(&entry.answer);
entry.answer = answer_rrs;
to
The two are functionally equivalent under RFC 1034's "order is not significant" reading. A unit test on the cache code would have passed either version (no ordering assertion). A boundary test — "query a domain with a CNAME chain, parse the response, assert the CNAME records appear before the A records" — would have caught it. Cloudflare's stated remediation includes writing exactly this test.
Where it fits in the remediation stack¶
- Pre-commit: lint/static analysis rarely catches invariant
violations unless the invariant is explicit in the type system
(Rust's
Option<T>+ exhaustive match, TypeScript's branded types). An ordering invariant on aVec<ResourceRecord>has no static representation. - Test suite: this pattern. The cheapest durable fix.
- Canary / staged rollout: patterns/staged-rollout will eventually catch invariant violations if the affected population is large enough to show up in metrics before the deploy completes. In Cloudflare's case, the broken clients were a small fraction of traffic and uncorrelated with POP selection — so every pre-90 % checkpoint passed clean and the defect landed fleet-wide.
- Runtime fail-open: fail-open handling can soften the impact of an invariant violation, but doesn't stop the client-side crash.
Companion patterns¶
- patterns/fast-rollback — the post-detection recovery path. Cloudflare got from incident-declaration to revert-start in 8 min (18:19 → 18:27 UTC) because the change was single-commit single-path; the test-the-ambiguous-invariant pattern is how you avoid needing fast rollback in the first place.
- patterns/staged-rollout — the defence-in-depth partner: invariant tests catch the correctness bar, staged rollouts catch the operational bar. Both fail on the same incident if the broken population is small and correctness isn't boundary- tested.
Generalisation: "the spec permits it, but we can't ship it"¶
The meta-rule: every ambiguous reading of a spec that your code has settled on should get a boundary test. The act of writing the test often forces the team to explicitly name the implicit invariant — which is itself documentation for future maintainers who didn't know the convention was load-bearing.