Skip to content

PATTERN Cited by 3 sources

Property-Based Testing

Definition

Property-based testing is a testing pattern where, instead of asserting outputs on specific inputs (traditional example-based unit tests), you:

  1. Declare a property that must hold for all inputs of some type — a predicate the system under test should satisfy universally (e.g. "sync converges," "no panics," "queue FIFO order preserved").
  2. Randomly generate inputs of that type from a typed generator.
  3. Run the system and check the property. If it holds for N generated inputs (often thousands to millions), you've gained evidence the property is true.
  4. On failure, output the seed (patterns/seed-recorded-failure-reproducibility) and shrink the failing input (concepts/test-case-minimization) to a minimal reproducer.

Originated in QuickCheck (Haskell, Claessen & Hughes 2000). Mainstream libraries: Hypothesis (Python), proptest + quickcheck (Rust), fast-check (JS), jqwik (Java), test.check (Clojure), Kotest (Kotlin), ScalaCheck.

Why this beats example-based tests

Example-based tests only check the inputs you thought of. Production bugs are almost always in the inputs you didn't. Property-based testing shifts the failure mode from "humans miss edge cases" to "the generator must be good enough to cover the edge cases" — which is itself imperfect but tractable (you can keep improving the generator) and benefits from deterministic replay (once you've hit an edge case, it's reproducible forever).

Canonical form

#[test]
fn sync_converges_for_any_canopy() {
    property_test! {
        // 1. Typed generator
        for (remote, local, synced) in any_three_trees() {
            // 2. Run system under test
            let final_state = run_planner_to_fixpoint(remote, local, synced);
            // 3. Universal property
            assert_eq!(final_state.remote, final_state.local);
            assert_eq!(final_state.local, final_state.synced);
            // Additional asymmetric invariants would go here...
        }
    }
}

The generator returns typed, structurally-valid inputs. The property is a predicate over the inputs and outputs. Shrinking on failure is handled by the framework, usually by composing per-type shrinkers.

Production examples in the wiki

  • systems/canopycheck (Dropbox, ~2020). Randomizes three trees; asserts "all three converge" + stronger asymmetric invariants; shrinks by removing nodes from the initial trees. Scope is narrow to the planner only, no I/O.
  • systems/shardstore (AWS S3, SOSP '21). Property-based testing against an adjacent executable specification (~1% of impl size, checked into the same repo); runs on every commit. The related pattern is patterns/executable-specification — the property is "impl behaves like spec on this input."

Where this sits relative to neighbors

  • concepts/deterministic-simulation is the discipline that makes property-based testing reliably reproducible at scale by eliminating non-determinism sources the test framework can see (scheduler, network, timer). Without it, property-based testing works for pure/narrow code (CanopyCheck's planner-only scope) but struggles end-to-end.
  • concepts/lightweight-formal-verification is the same-stance, different-tool neighbor: specs + property tests checked on every commit, instead of full mechanical proofs. ShardStore is the canonical realization.
  • concepts/test-case-minimization is the shrinker; without it, property-based failures are a grep-through-logs exercise. With it, failing tests produce diagnosable minimal reproducers.
  • patterns/seed-recorded-failure-reproducibility is the developer-experience contract; without it, property-based tests become the "flaky and ignored" category.

When not to use

  • When the property is hard to express mechanically. "Looks good to a designer" isn't a property. "Doesn't regress our benchmark" isn't a property.
  • When the input space isn't usefully randomly-samplable — most ML-model-behavior tests, for instance.
  • When failures are extremely rare and the CI budget doesn't support the run volume needed to hit them. Property-based testing shines at ≥10⁴ runs; at 10² runs it's barely better than example-based.

Seen in

Last updated · 200 distilled / 1,178 read