PATTERN Cited by 3 sources

Property-Based Testing¶

Definition¶

Property-based testing is a testing pattern where, instead of asserting outputs on specific inputs (traditional example-based unit tests), you:

Declare a property that must hold for all inputs of some type — a predicate the system under test should satisfy universally (e.g. "sync converges," "no panics," "queue FIFO order preserved").
Randomly generate inputs of that type from a typed generator.
Run the system and check the property. If it holds for N generated inputs (often thousands to millions), you've gained evidence the property is true.
On failure, output the seed (patterns/seed-recorded-failure-reproducibility) and shrink the failing input (concepts/test-case-minimization) to a minimal reproducer.

Originated in QuickCheck (Haskell, Claessen & Hughes 2000). Mainstream libraries: Hypothesis (Python), proptest + quickcheck (Rust), fast-check (JS), jqwik (Java), test.check (Clojure), Kotest (Kotlin), ScalaCheck.

Why this beats example-based tests¶

Example-based tests only check the inputs you thought of. Production bugs are almost always in the inputs you didn't. Property-based testing shifts the failure mode from "humans miss edge cases" to "the generator must be good enough to cover the edge cases" — which is itself imperfect but tractable (you can keep improving the generator) and benefits from deterministic replay (once you've hit an edge case, it's reproducible forever).

Canonical form¶

#[test]
fn sync_converges_for_any_canopy() {
    property_test! {
        // 1. Typed generator
        for (remote, local, synced) in any_three_trees() {
            // 2. Run system under test
            let final_state = run_planner_to_fixpoint(remote, local, synced);
            // 3. Universal property
            assert_eq!(final_state.remote, final_state.local);
            assert_eq!(final_state.local, final_state.synced);
            // Additional asymmetric invariants would go here...
        }
    }
}

The generator returns typed, structurally-valid inputs. The property is a predicate over the inputs and outputs. Shrinking on failure is handled by the framework, usually by composing per-type shrinkers.

Production examples in the wiki¶

systems/canopycheck (Dropbox, ~2020). Randomizes three trees; asserts "all three converge" + stronger asymmetric invariants; shrinks by removing nodes from the initial trees. Scope is narrow to the planner only, no I/O.
systems/shardstore (AWS S3, SOSP '21). Property-based testing against an adjacent executable specification (~1% of impl size, checked into the same repo); runs on every commit. The related pattern is patterns/executable-specification — the property is "impl behaves like spec on this input."

Where this sits relative to neighbors¶

concepts/deterministic-simulation is the discipline that makes property-based testing reliably reproducible at scale by eliminating non-determinism sources the test framework can see (scheduler, network, timer). Without it, property-based testing works for pure/narrow code (CanopyCheck's planner-only scope) but struggles end-to-end.
concepts/lightweight-formal-verification is the same-stance, different-tool neighbor: specs + property tests checked on every commit, instead of full mechanical proofs. ShardStore is the canonical realization.
concepts/test-case-minimization is the shrinker; without it, property-based failures are a grep-through-logs exercise. With it, failing tests produce diagnosable minimal reproducers.
patterns/seed-recorded-failure-reproducibility is the developer-experience contract; without it, property-based tests become the "flaky and ignored" category.

When not to use¶

When the property is hard to express mechanically. "Looks good to a designer" isn't a property. "Doesn't regress our benchmark" isn't a property.
When the input space isn't usefully randomly-samplable — most ML-model-behavior tests, for instance.
When failures are extremely rare and the CI budget doesn't support the run volume needed to hit them. Property-based testing shines at ≥10⁴ runs; at 10² runs it's barely better than example-based.

Seen in¶

sources/2024-05-31-dropbox-testing-sync-at-dropbox-2020 — CanopyCheck as the narrow / planner-only realization; Trinity inherits the stance but can't shrink end-to-end.
sources/2025-02-25-allthingsdistributed-building-and-operating-s3 — ShardStore executable-spec + property-tested-on-every-commit as the production storage realization.
sources/2025-06-02-mongodb-conformance-checking-at-mongodb-testing-that-our-code-matches-our-tla-specs — adjacent but distinct technique: test-case generation from a TLA+ spec produces enumerated tests rather than randomly-generated ones. For the MongoDB Mobile SDK's OT merge rules, 4,913 generated tests achieved 100% branch coverage vs. 92% from millions of AFL-fuzzer (random-input) runs. The comparison: PBT + generator covers a sampled slice of input space, test-case generation + model-checker covers all behaviours of a constrained spec — exhaustive when it fits, not when the state-space blows up.
— Zalando Mobile iOS entry-level realization. Vijaya Kandel's open-source Swift library Randomizer provides a Random protocol + Standard Library conformances; user-defined types conform by delegating to per-field .random (type-class-driven random generator at mobile altitude). Frames the pattern as the fix for the example-based constant-input antipattern. Positioned below the other three Seen-ins on this page — idiomatic usage in 2021 is one random permutation per test run, no shrinker, no seed replay, no build- time codegen. The gateway-drug form of the pattern, not the mature form. The altitude novelty is the Swift iOS mobile-UI testing context — the other three Seen-ins are all backend / storage / spec (Rust / Rust / TLA+).