PATTERN

Scenario minimalism for probe reliability¶

What this is¶

Scenario minimalism for probe reliability is the pattern of keeping the e2e test probe suite deliberately small — a handful of scenarios, each mapping 1-to-1 to a critical customer journey — as the primary lever for reaching pager-grade reliability. It is scope-reduction as reliability discipline: the cheapest way to stop a test from flaking is to not run it.

The pattern explicitly inverts the CI-era mental model ("broad coverage + retry + better selectors → higher reliability"). At probe altitude the math is different: compound-flake probability multiplies with interactions, so fewer scenarios + fewer interactions-per-scenario beat any amount of within-scenario hardening.

Why minimalism¶

Compound-flake math. Per-interaction success $p$ compounds as $p^N$ across $N$ interactions per scenario, and again across $M$ scenarios for overall suite reliability. Halving $N$ roughly halves the flake rate. See concepts/test-reliability-through-simplification for the full arithmetic.
CBO alignment falls out. If each probe scenario is one CBO, the suite is bounded by the CBO catalog (usually single-digit to low-hundreds). You can't scope-creep a probe suite when the scoping unit is already curated.
Debugging is tractable. Three scenarios with three HTML reports per failure is actionable on a busy on-call day; 300 scenarios is not.
Simplification can't be retry-masked. Unlike "add an expect.toPass retry", removing a step cannot hide an underlying real issue.

Shape (Zalando instantiation)¶

Source: :

Three named scenarios at publication, each under ~10 lines of code:
1. Home page → gender page → product click
2. Catalog page → apply filter → product click
3. Product page → select size → add to cart → start checkout
One scenario per CBO. Each is a canonical customer journey through the Zalando storefront.
Minimal steps per scenario. The catalog test is ~10 interactions — navigate, open filter, click a color, save, wait, click product, assert URL.
Scope growth gated. The declared growth path is "more CBOs" — not "more code paths per CBO", not "more edge-case scenarios". Each new scenario passes through shadow-mode validation before joining the paging set.

Preconditions¶

An existing CBO catalog — the scoping unit must be curated, not derived from test-matrix completeness.
A CI tier for broad coverage. The probe tier is not a replacement for CI; Zalando continues running its Cypress CI suite for per-release regression coverage. Minimalism at probe altitude is only defensible when a complementary broad-coverage tier exists elsewhere.

What to cut¶

Edge cases. Not-in-stock products, paywalled surfaces, localisation permutations — CI should cover these, not probe.
Assertion fan-out. One end-state assertion beats five intermediate-state assertions.
Parameterised test sets. A CBO probe runs against one known-good data fixture, not a table of inputs.
Cross-browser matrix. Run probes on one canonical browser; CI covers the matrix.
Deep DOM traversal. Prefer role- or data-testid- based selectors over long CSS chains.

What to keep¶

The critical path of each CBO. Navigate, interact, verify the load-bearing step worked.
A final real-state assertion. URL contains .html, cart count > 0, confirmation page loaded. The step that, if broken, means the user cannot transact.
Realistic framing. Zalando kept a waitForTimeout in one scenario "to simulate 'real user behavior'" (annotated as not strictly necessary with Playwright).

Tensions and failure modes¶

Coverage pressure. Product stakeholders naturally want probe coverage for every new launch; the maintenance cost of that path is the whole reliability budget.
Anti-pattern — probe as QA replacement. A probe suite with 50 scenarios is not a probe suite; it's a CI suite running in production. Maintain the tier boundary explicitly.
Anti-pattern — probe as perf test. Probes are pass/ fail; latency probes are a different primitive with different cadence and threshold tuning requirements.
Silent CBO drift. If the product changes and a CBO's journey evolves (new step, new decision point), probes that don't track it go stale and either break or stop exercising the critical path. Probe scripts need a review cadence aligned with product change.

Contrast¶

CI e2e regression suite — broad coverage, per-commit, higher flakiness-tolerance (retries mask flake for CI engineers), many scenarios.
HTTP-level synthetic probe — even more minimal (single request), no interactivity, lower-altitude sibling.
Load test — same scenario-minimalism discipline often applies (scheduled cron-triggered load test) — focus on a handful of critical flows, not the full test matrix.

Seen in¶

— canonical wiki instance. Three Playwright scenarios covering home→product, catalog→filter→product, and product→checkout. Each ~10 lines of code. Combined with Playwright auto-wait and multi- week shadow-mode validation, produced 0 % false-positive rate at 30-minute cadence against zalando.com.

concepts/test-reliability-through-simplification — the enabling concept.
concepts/flaky-test — the failure mode being managed.
concepts/end-to-end-test-probe — the primitive this scoping discipline applies to.
concepts/critical-business-operation — the scoping unit.
patterns/e2e-test-as-synthetic-probe — the parent pattern.
patterns/shadow-mode-alert-before-paging — the complementary validation gate.
systems/playwright — canonical probe framework.