PATTERN Cited by 1 source
Scenario minimalism for probe reliability¶
What this is¶
Scenario minimalism for probe reliability is the pattern of keeping the e2e test probe suite deliberately small — a handful of scenarios, each mapping 1-to-1 to a critical customer journey — as the primary lever for reaching pager-grade reliability. It is scope-reduction as reliability discipline: the cheapest way to stop a test from flaking is to not run it.
The pattern explicitly inverts the CI-era mental model ("broad coverage + retry + better selectors → higher reliability"). At probe altitude the math is different: compound-flake probability multiplies with interactions, so fewer scenarios + fewer interactions-per-scenario beat any amount of within-scenario hardening.
Why minimalism¶
- Compound-flake math. Per-interaction success $p$ compounds as $p^N$ across $N$ interactions per scenario, and again across $M$ scenarios for overall suite reliability. Halving $N$ roughly halves the flake rate. See concepts/test-reliability-through-simplification for the full arithmetic.
- CBO alignment falls out. If each probe scenario is one CBO, the suite is bounded by the CBO catalog (usually single-digit to low-hundreds). You can't scope-creep a probe suite when the scoping unit is already curated.
- Debugging is tractable. Three scenarios with three HTML reports per failure is actionable on a busy on-call day; 300 scenarios is not.
- Simplification can't be retry-masked. Unlike "add an
expect.toPassretry", removing a step cannot hide an underlying real issue.
Shape (Zalando instantiation)¶
Source: sources/2024-07-18-zalando-end-to-end-test-probes-with-playwright:
- Three named scenarios at publication, each under
~10 lines of code:
- Home page → gender page → product click
- Catalog page → apply filter → product click
- Product page → select size → add to cart → start checkout
- One scenario per CBO. Each is a canonical customer journey through the Zalando storefront.
- Minimal steps per scenario. The catalog test is ~10 interactions — navigate, open filter, click a color, save, wait, click product, assert URL.
- Scope growth gated. The declared growth path is "more CBOs" — not "more code paths per CBO", not "more edge-case scenarios". Each new scenario passes through shadow-mode validation before joining the paging set.
Preconditions¶
- An existing CBO catalog — the scoping unit must be curated, not derived from test-matrix completeness.
- A CI tier for broad coverage. The probe tier is not a replacement for CI; Zalando continues running its Cypress CI suite for per-release regression coverage. Minimalism at probe altitude is only defensible when a complementary broad-coverage tier exists elsewhere.
What to cut¶
- Edge cases. Not-in-stock products, paywalled surfaces, localisation permutations — CI should cover these, not probe.
- Assertion fan-out. One end-state assertion beats five intermediate-state assertions.
- Parameterised test sets. A CBO probe runs against one known-good data fixture, not a table of inputs.
- Cross-browser matrix. Run probes on one canonical browser; CI covers the matrix.
- Deep DOM traversal. Prefer role- or
data-testid- based selectors over long CSS chains.
What to keep¶
- The critical path of each CBO. Navigate, interact, verify the load-bearing step worked.
- A final real-state assertion. URL contains
.html, cart count > 0, confirmation page loaded. The step that, if broken, means the user cannot transact. - Realistic framing. Zalando kept a
waitForTimeoutin one scenario "to simulate 'real user behavior'" (annotated as not strictly necessary with Playwright).
Tensions and failure modes¶
- Coverage pressure. Product stakeholders naturally want probe coverage for every new launch; the maintenance cost of that path is the whole reliability budget.
- Anti-pattern — probe as QA replacement. A probe suite with 50 scenarios is not a probe suite; it's a CI suite running in production. Maintain the tier boundary explicitly.
- Anti-pattern — probe as perf test. Probes are pass/ fail; latency probes are a different primitive with different cadence and threshold tuning requirements.
- Silent CBO drift. If the product changes and a CBO's journey evolves (new step, new decision point), probes that don't track it go stale and either break or stop exercising the critical path. Probe scripts need a review cadence aligned with product change.
Contrast¶
- CI e2e regression suite — broad coverage, per-commit, higher flakiness-tolerance (retries mask flake for CI engineers), many scenarios.
- HTTP-level synthetic probe — even more minimal (single request), no interactivity, lower-altitude sibling.
- Load test — same scenario-minimalism discipline often applies (scheduled cron-triggered load test) — focus on a handful of critical flows, not the full test matrix.
Seen in¶
- sources/2024-07-18-zalando-end-to-end-test-probes-with-playwright — canonical wiki instance. Three Playwright scenarios covering home→product, catalog→filter→product, and product→checkout. Each ~10 lines of code. Combined with Playwright auto-wait and multi- week shadow-mode validation, produced 0 % false-positive rate at 30-minute cadence against zalando.com.
Related¶
- concepts/test-reliability-through-simplification — the enabling concept.
- concepts/flaky-test — the failure mode being managed.
- concepts/end-to-end-test-probe — the primitive this scoping discipline applies to.
- concepts/critical-business-operation — the scoping unit.
- patterns/e2e-test-as-synthetic-probe — the parent pattern.
- patterns/shadow-mode-alert-before-paging — the complementary validation gate.
- systems/playwright — canonical probe framework.