PATTERN Cited by 3 sources
Property-Based Testing¶
Definition¶
Property-based testing is a testing pattern where, instead of asserting outputs on specific inputs (traditional example-based unit tests), you:
- Declare a property that must hold for all inputs of some type — a predicate the system under test should satisfy universally (e.g. "sync converges," "no panics," "queue FIFO order preserved").
- Randomly generate inputs of that type from a typed generator.
- Run the system and check the property. If it holds for N generated inputs (often thousands to millions), you've gained evidence the property is true.
- On failure, output the seed (patterns/seed-recorded-failure-reproducibility) and shrink the failing input (concepts/test-case-minimization) to a minimal reproducer.
Originated in QuickCheck (Haskell, Claessen & Hughes 2000). Mainstream libraries: Hypothesis (Python), proptest + quickcheck (Rust), fast-check (JS), jqwik (Java), test.check (Clojure), Kotest (Kotlin), ScalaCheck.
Why this beats example-based tests¶
Example-based tests only check the inputs you thought of. Production bugs are almost always in the inputs you didn't. Property-based testing shifts the failure mode from "humans miss edge cases" to "the generator must be good enough to cover the edge cases" — which is itself imperfect but tractable (you can keep improving the generator) and benefits from deterministic replay (once you've hit an edge case, it's reproducible forever).
Canonical form¶
#[test]
fn sync_converges_for_any_canopy() {
property_test! {
// 1. Typed generator
for (remote, local, synced) in any_three_trees() {
// 2. Run system under test
let final_state = run_planner_to_fixpoint(remote, local, synced);
// 3. Universal property
assert_eq!(final_state.remote, final_state.local);
assert_eq!(final_state.local, final_state.synced);
// Additional asymmetric invariants would go here...
}
}
}
The generator returns typed, structurally-valid inputs. The property is a predicate over the inputs and outputs. Shrinking on failure is handled by the framework, usually by composing per-type shrinkers.
Production examples in the wiki¶
- systems/canopycheck (Dropbox, ~2020). Randomizes three trees; asserts "all three converge" + stronger asymmetric invariants; shrinks by removing nodes from the initial trees. Scope is narrow to the planner only, no I/O.
- systems/shardstore (AWS S3, SOSP '21). Property-based testing against an adjacent executable specification (~1% of impl size, checked into the same repo); runs on every commit. The related pattern is patterns/executable-specification — the property is "impl behaves like spec on this input."
Where this sits relative to neighbors¶
- concepts/deterministic-simulation is the discipline that makes property-based testing reliably reproducible at scale by eliminating non-determinism sources the test framework can see (scheduler, network, timer). Without it, property-based testing works for pure/narrow code (CanopyCheck's planner-only scope) but struggles end-to-end.
- concepts/lightweight-formal-verification is the same-stance, different-tool neighbor: specs + property tests checked on every commit, instead of full mechanical proofs. ShardStore is the canonical realization.
- concepts/test-case-minimization is the shrinker; without it, property-based failures are a grep-through-logs exercise. With it, failing tests produce diagnosable minimal reproducers.
- patterns/seed-recorded-failure-reproducibility is the developer-experience contract; without it, property-based tests become the "flaky and ignored" category.
When not to use¶
- When the property is hard to express mechanically. "Looks good to a designer" isn't a property. "Doesn't regress our benchmark" isn't a property.
- When the input space isn't usefully randomly-samplable — most ML-model-behavior tests, for instance.
- When failures are extremely rare and the CI budget doesn't support the run volume needed to hit them. Property-based testing shines at ≥10⁴ runs; at 10² runs it's barely better than example-based.
Seen in¶
- sources/2024-05-31-dropbox-testing-sync-at-dropbox-2020 — CanopyCheck as the narrow / planner-only realization; Trinity inherits the stance but can't shrink end-to-end.
- sources/2025-02-25-allthingsdistributed-building-and-operating-s3 — ShardStore executable-spec + property-tested-on-every-commit as the production storage realization.
- sources/2025-06-02-mongodb-conformance-checking-at-mongodb-testing-that-our-code-matches-our-tla-specs — adjacent but distinct technique: test-case generation from a TLA+ spec produces enumerated tests rather than randomly-generated ones. For the MongoDB Mobile SDK's OT merge rules, 4,913 generated tests achieved 100% branch coverage vs. 92% from millions of AFL-fuzzer (random-input) runs. The comparison: PBT + generator covers a sampled slice of input space, test-case generation + model-checker covers all behaviours of a constrained spec — exhaustive when it fits, not when the state-space blows up.