Skip to content

PATTERN Cited by 1 source

Seed-Recorded Failure Reproducibility

Definition

Seed-recorded failure reproducibility is the developer-experience discipline that every failing randomized test records enough state to be replayed exactly, and the re-run is guaranteed to produce the same failure. It is the human-facing contract that sits on top of concepts/deterministic-simulation and makes randomized testing usable at scale.

The canonical form, from Dropbox:

All randomized testing frameworks must be fully deterministic and easily reproducible.

  1. At the beginning of a random test run, generate a random seed.
  2. Instantiate a pseudorandom number generator (PRNG) with that seed.
  3. Run the test using that PRNG for all random decisions.
  4. If the test fails, output the seed.

sources/2024-05-31-dropbox-testing-sync-at-dropbox-2020

On failure: add logging inline, re-run with the saved seed, it's guaranteed to fail again. This is what transforms randomized testing from "source of extreme frustration" (Dropbox's own description of their Sync Engine Classic randomized tests) to "the indispensable confidence source for rewriting a critical system."

Why this matters (more than it sounds)

Randomized tests that fail non-reproducibly are net-negative in most organizations:

  • You can't add logging and rerun — the next run passes.
  • You can't set a breakpoint — the next run passes.
  • You can't bisect — git-bisect requires a reliable reproducer.
  • Engineers learn to re-trigger CI when a random test fails.
  • Flakes accumulate as ignored noise.
  • When real bugs fire in the ignored noise, they are invisible.

This failure mode is the default for naively-written randomized tests. The seed-recorded-reproducibility discipline is what turns the negative-feedback loop positive.

Components

PRNG seed. A single integer (typically 64-bit) that initializes the RNG. Every random decision in the test flows from this seed.

Code version. The commit hash at test time. "Same seed" alone isn't enough — if the code changes, the PRNG consumption sequence changes, and the failure may not reproduce. Dropbox names this explicitly: "Note also the importance of the commit hash, as another type of 'test input' alongside the seed: if the code changes, the course of execution may change too!"

CI auto-filing. When a nightly random run fails, CI creates a tracking task carrying (seed, commit-hash) — not a manually-filed bug with an English description of what broke. The bug is the (seed, commit-hash) pair; the fix includes the seed in the regression test to keep it green.

Inline-editable system-under-test. The engineer needs to be able to add println! / logger.debug / breakpoint and re-run the same seed. This requires that the test harness not recompile the PRNG state out of existence (the PRNG consumption pattern must be stable with respect to logging-only code additions).

What the system must give up to make this work

Some sources of non-determinism the system under test might naturally use have to be replaced with deterministic alternatives:

  • Random hash-map seeding (Rust HashMap default, Python PYTHONHASHSEED) — override with deterministic hashers, even at the cost of DoS-resistance, if that trade is acceptable for the threat model.
  • System time — replaced with a mockable timer driven from the PRNG.
  • Random thread scheduling — replaced with a custom executor / serialized-on-control-thread concurrency (patterns/single-threaded-with-offload).
  • Filesystem/network calls with nondeterministic ordering — mocked or intercepted by the test executor.

The payoff: the (seed, commit-hash) → execution-trace mapping is total.

Scale story

Dropbox's Nucleus team runs tens of millions of randomized test runs per night under this discipline, "in general, 100% green on the latest master." When regressions slip in, they surface as tracking tasks with exact (seed, commit-hash) pairs — so an engineer picks one up, reruns locally with that seed, and debugs against a 100%-reliable reproducer.

Where this sits relative to neighbors

  • concepts/deterministic-simulation is the architectural discipline that makes this pattern possible. This pattern is the developer-facing contract built on top of it.
  • patterns/property-based-testing typically uses this pattern (QuickCheck, Hypothesis, proptest all print the seed on failure), but example-based tests with fault injection or time-based flakiness can also adopt it.
  • Related anti-pattern: "retry until green" CI that re-runs a failed random test 3× and passes on any success. This hides real bugs and is explicitly called out as a failure in the Sync Engine Classic era the Dropbox post describes.

Seen in

Last updated · 200 distilled / 1,178 read