Skip to content

PATTERN Cited by 1 source

Scenario minimalism for probe reliability

What this is

Scenario minimalism for probe reliability is the pattern of keeping the e2e test probe suite deliberately small — a handful of scenarios, each mapping 1-to-1 to a critical customer journey — as the primary lever for reaching pager-grade reliability. It is scope-reduction as reliability discipline: the cheapest way to stop a test from flaking is to not run it.

The pattern explicitly inverts the CI-era mental model ("broad coverage + retry + better selectors → higher reliability"). At probe altitude the math is different: compound-flake probability multiplies with interactions, so fewer scenarios + fewer interactions-per-scenario beat any amount of within-scenario hardening.

Why minimalism

  • Compound-flake math. Per-interaction success $p$ compounds as $p^N$ across $N$ interactions per scenario, and again across $M$ scenarios for overall suite reliability. Halving $N$ roughly halves the flake rate. See concepts/test-reliability-through-simplification for the full arithmetic.
  • CBO alignment falls out. If each probe scenario is one CBO, the suite is bounded by the CBO catalog (usually single-digit to low-hundreds). You can't scope-creep a probe suite when the scoping unit is already curated.
  • Debugging is tractable. Three scenarios with three HTML reports per failure is actionable on a busy on-call day; 300 scenarios is not.
  • Simplification can't be retry-masked. Unlike "add an expect.toPass retry", removing a step cannot hide an underlying real issue.

Shape (Zalando instantiation)

Source: sources/2024-07-18-zalando-end-to-end-test-probes-with-playwright:

  • Three named scenarios at publication, each under ~10 lines of code:
    1. Home page → gender page → product click
    2. Catalog page → apply filter → product click
    3. Product page → select size → add to cart → start checkout
  • One scenario per CBO. Each is a canonical customer journey through the Zalando storefront.
  • Minimal steps per scenario. The catalog test is ~10 interactions — navigate, open filter, click a color, save, wait, click product, assert URL.
  • Scope growth gated. The declared growth path is "more CBOs" — not "more code paths per CBO", not "more edge-case scenarios". Each new scenario passes through shadow-mode validation before joining the paging set.

Preconditions

  • An existing CBO catalog — the scoping unit must be curated, not derived from test-matrix completeness.
  • A CI tier for broad coverage. The probe tier is not a replacement for CI; Zalando continues running its Cypress CI suite for per-release regression coverage. Minimalism at probe altitude is only defensible when a complementary broad-coverage tier exists elsewhere.

What to cut

  1. Edge cases. Not-in-stock products, paywalled surfaces, localisation permutations — CI should cover these, not probe.
  2. Assertion fan-out. One end-state assertion beats five intermediate-state assertions.
  3. Parameterised test sets. A CBO probe runs against one known-good data fixture, not a table of inputs.
  4. Cross-browser matrix. Run probes on one canonical browser; CI covers the matrix.
  5. Deep DOM traversal. Prefer role- or data-testid- based selectors over long CSS chains.

What to keep

  1. The critical path of each CBO. Navigate, interact, verify the load-bearing step worked.
  2. A final real-state assertion. URL contains .html, cart count > 0, confirmation page loaded. The step that, if broken, means the user cannot transact.
  3. Realistic framing. Zalando kept a waitForTimeout in one scenario "to simulate 'real user behavior'" (annotated as not strictly necessary with Playwright).

Tensions and failure modes

  • Coverage pressure. Product stakeholders naturally want probe coverage for every new launch; the maintenance cost of that path is the whole reliability budget.
  • Anti-pattern — probe as QA replacement. A probe suite with 50 scenarios is not a probe suite; it's a CI suite running in production. Maintain the tier boundary explicitly.
  • Anti-pattern — probe as perf test. Probes are pass/ fail; latency probes are a different primitive with different cadence and threshold tuning requirements.
  • Silent CBO drift. If the product changes and a CBO's journey evolves (new step, new decision point), probes that don't track it go stale and either break or stop exercising the critical path. Probe scripts need a review cadence aligned with product change.

Contrast

  • CI e2e regression suite — broad coverage, per-commit, higher flakiness-tolerance (retries mask flake for CI engineers), many scenarios.
  • HTTP-level synthetic probe — even more minimal (single request), no interactivity, lower-altitude sibling.
  • Load test — same scenario-minimalism discipline often applies (scheduled cron-triggered load test) — focus on a handful of critical flows, not the full test matrix.

Seen in

Last updated · 501 distilled / 1,218 read