Skip to content

CONCEPT Cited by 1 source

Test Case Minimization

Definition

Test case minimization (also: shrinking) is the step in a property-based testing workflow where, once a random input causes a test failure, the framework iteratively reduces the input — by removing elements, lowering numeric values, simplifying structures — and re-runs, keeping the input simple enough to still reproduce the failure. The result is a minimally-complex reproducer that isolates the bug from the random noise it was buried in.

Named and originated by QuickCheck (Haskell, Claessen & Hughes, 2000); now a standard feature of modern property-based testing libraries (Hypothesis, proptest, fast-check, test.check).

Canonical motivation: a random-generated test that fails on an input of "37 nested nodes across three trees with a specific move+rename interaction" becomes — after minimization — "3 nodes: a local move combined with a remote move creating a cycle." The latter is a diagnosable bug report; the former is a grep through a log file.

Why this matters

Tracking dozens of nodes across three trees is simply not easy for a human to do, as the real bug is often hidden alongside many red herrings. This minimization removes such distractions, often making it much easier to diagnose at a glance.

Dropbox CanopyCheck

Minimization is the feature that turns property-based testing from a "bug-finding machine with painful output" into a "bug-finding machine with diagnosable output." Its absence is what has historically prevented end-to-end fuzzing/stochastic testing from displacing hand-written unit tests: a 3-hour flaky-test failure with a 500-line log is worse than no test at all, because engineers learn to ignore it.

The shrinking procedure

The exact procedure depends on input shape, but the general form:

  1. Test failed on input X. Save (seed, input).
  2. Generate a "smaller" candidate X' — remove one element, halve a numeric range, drop one branch of a tree, etc.
  3. Re-run the test with X'. If it still fails, X := X', go to step 2. If it passes, try a different smaller candidate.
  4. Stop when no smaller candidate reproduces the failure.

The result X is a local minimum — often, but not always, a global minimum.

Canonical realization: CanopyCheck (Dropbox)

systems/canopycheck's inputs are three trees. When a random seed fails an invariant:

  • Iteratively remove nodes from the initial trees.
  • Re-run (same planner, same seed-driven operation ordering).
  • Keep shrinking if the failure persists.

The simple-typed input shape (three trees, nothing else) is what makes this work — every candidate "smaller" input is obviously a valid test case.

Why it fails end-to-end (Trinity)

systems/trinity — Dropbox's end-to-end concurrency-testing framework — cannot easily minimize, and the post names this as an explicit limitation:

Even a small change to the three trees' initial state might change the order of network request scheduling down the line, and thereby invalidate a failing random seed — hard-won at the cost of many CPU-days!

The problem: Trinity's coverage depends on mocks that inject failures and reorder RPCs throughout execution. The scheduling of those failures is a function of the PRNG state, which is a function of every decision made so far. Perturbing the initial state changes the chain of PRNG consumption, which changes the schedule, which almost always makes the failure go away — but not because the bug is fixed, just because a different schedule is being tested.

Candidate fix noted in the post: "decouple the global PRNG into several independent ones." A per-subsystem PRNG (one for filesystem faults, one for network reordering, one for time, one for input generation) would let you perturb the initial-state PRNG without invalidating the schedule PRNG, making shrinking on input feasible again. As of the post, this was being considered; no outcome captured.

Seen in

Last updated · 200 distilled / 1,178 read