Skip to content

PATTERN Cited by 1 source

E2E test as synthetic probe

What this is

E2E test as synthetic probe is the pattern of running a small set of browser-driving end-to-end test scenarios on a scheduled cron against live production, alerting the on-call team when a scenario fails. The pattern reuses the e2e test framework (Playwright, Cypress, Selenium) as a periodic external black-box monitor rather than a per-commit regression gate. The test scenarios are scoped to critical customer journeys and treated as a new symptom source for the alerting stack, complementing trace-derived symptom-based alerts.

Why

Traditional defenses leave a browser-altitude blind spot:

  • CI/CD e2e tests run pre-deploy against newly-built code. They don't see regressions driven by external factors — CMS content drift, API-gateway contract drift, third-party outages, CDN cache issues — that only surface in live production.
  • Service-level monitors (SLO breaches, 5xx rates, trace- derived CBO error rates) don't see front-end interactivity failures when HTTP still returns 200 but hydration crashes or JS errors prevent interaction.
  • Raw HTTP synthetic probes (Prometheus Blackbox Exporter-style) probe connectivity and content but don't drive a real user journey.

The e2e test probe fills that blind spot: a real browser, driving a real user flow, against real production — will the user actually be able to check out?

Shape (Zalando instantiation)

Source: sources/2024-07-18-zalando-end-to-end-test-probes-with-playwright.

  • Framework: Playwright (chosen over Cypress for auto-wait / auto-retry / tracing / unified browser API / TypeScript).
  • Scope: three named scenarios at publication, each a CBO:
    1. Home → gender page → product click
    2. Catalog page → apply filter → product click
    3. Product page → select size → add to cart → start checkout
  • Cadence: 30-minute cron job.
  • Rollout: email-only shadow mode for several weeks; iterate on selectors + local expect.toPass retries + :visible pseudo-classes until zero false positives; then promote to paging.
  • Post-promotion reality: "only paged us once, and that was during an incident where the page was actually not working."0 % false-positive rate.
  • Declared growth path: more CBOs, extension to mobile apps.

Preconditions

  • CBO catalog exists. The probe scope should align with explicitly-named critical customer journeys, not the full test matrix. At Zalando these come from the existing Operation-Based SLOs work.
  • Alerting infrastructure with multi-severity channels. The probe needs an email-only tier for shadow mode and a paging tier for post-validation.
  • Cron scheduler (Kubernetes CronJob or equivalent).
  • Debugging artifact capture (Playwright HTML reports, traces, videos). Without these, shadow-mode iteration is blind.

Tensions and failure modes

  • Reliability must exceed the cadence arithmetic. 95 % at 30-min cadence = ~2.4 false positives/day — too noisy for paging. Probe-grade requires ~99.9 %. See concepts/flaky-test for the full arithmetic.
  • Scope creep re-introduces flakiness. Adding scenarios without independent shadow-mode validation breaks the reliability budget. Discipline: patterns/scenario-minimalism-for-probe-reliability.
  • Selector coupling to UI refactors. Any non- data-testid selector is a ticking time bomb; broad UI redesigns will flake a probe suite that leans on CSS structure or text content.
  • Third-party dependency bleed. A probe that touches a third-party service inherits that service's SLO; paging the team for a third-party outage is a bug. Probe scope should be the first-party CBO surface only.
  • Probe as performance test. Probes are pass/fail, not latency measurements. Using probe run time as a performance signal without an explicit p95 latency threshold is unsupported.

Seen in

  • sources/2024-07-18-zalando-end-to-end-test-probes-with-playwrightcanonical wiki instance. Zalando Payments / frontend team deploys three Playwright probe scenarios on a 30- minute cron against zalando.com, promotes to paging after multi-week shadow-mode validation, achieves 0 % false- positive rate. Motivated by a 2024 product-detail-page React-hydration regression that existing monitoring missed. Declared growth path: "more [CBOs] [...] extending this idea to our mobile apps."
Last updated · 501 distilled / 1,218 read