Skip to content

CONCEPT Cited by 1 source

End-to-end test probe

Definition

An end-to-end test probe is a small set of browser-driving end-to-end test scenarios repurposed as a periodic external monitor of a live production system. Unlike CI/CD e2e tests (which run per-commit against newly-built code), test probes run on a fixed cadence (typically a cron) against the real production frontend, and their failure pages the on-call team as a new class of user-facing-symptom alert.

Three defining properties:

  1. Scope: critical customer journeys only. Not full CI coverage. The probe exercises a handful of CBOs as a real user would — home → product, catalog → filter → product, product → size → cart → checkout, etc. — not the long tail that a regression suite covers.
  2. Runtime: scheduled cron against live production. Not per-build. A typical cadence is every 30 minutes; at that rate the probe must be reliable enough to page.
  3. Semantics: page the on-call when a probe fails. Once out of shadow mode (concepts/shadow-mode-alert-validation), a probe failure is a symptom alert — not a regression signal.

Why probes are a distinct primitive

Zalando's 2024 framing (sources/2024-07-18-zalando-end-to-end-test-probes-with-playwright) names the gap clearly: CI/CD e2e tests catch bugs in newly-built code, but live production failures can be driven by external factors (headless CMS content drift, API-gateway contract drift, third-party outage, CDN cache poisoning) that no CI pipeline can fence against. The 2024 product-detail-page regression on zalando.com — new headless CMS content broke the front-end / API-gateway contract and crashed React hydration on size selection and add-to-cart — was "large enough to have a business impact, but not just [sic] enough to trigger an automated alert." The existing trace-derived symptom-based alerting stack did not fire because the failure was interactivity, not a backend error.

The probe fills this gap because:

  • It runs against the real production system, including real CMS content, real API gateway, real CDN cache.
  • It verifies interactivity, not just HTTP status — a page that 200s but doesn't hydrate still fails the probe.
  • It's a black-box external view, matching the real user's vantage point.

Relationship to adjacent primitives

  • vs. concepts/client-side-black-box-probe (HTTP / TCP / ICMP synthetic probes, e.g. Prometheus Blackbox Exporter) — e2e test probes are the same primitive at a higher altitude: instead of a raw HTTP GET, the probe drives a real browser through a real user journey. Both are client-side, both are black-box, both report pass/fail
  • timing metrics. E2E probes add interactivity and multi- step flow coverage at the cost of much higher per-probe cost and much higher flakiness risk.
  • vs. concepts/symptom-based-alerting — probes are a new symptom source, not a new strategy. They sit alongside trace-derived CBO error rates and raw service SLOs as a third symptom surface.
  • vs. CI/CD e2e tests — same framework, different tier. CI covers the test matrix; the probe covers the CBO catalog. The two can share code (selectors, fixtures, page objects) but have divergent reliability and operational requirements.

Design constraints

Reliability must exceed the cadence arithmetic

A 30-minute probe at 95 % reliability produces ~24 false- positive pages/day. To be pager-grade the probe needs reliability closer to 99.9 %. Practical levers:

  • Simplification — fewer scenarios → fewer false positives. Zalando's launch scope was three named scenarios.
  • Framework-altitude auto-wait / auto-retry — see concepts/playwright-locator-auto-wait. Flakiness comes from assumption-laden timing; framework-level auto-wait removes a large class of failure modes.
  • Local retry at assertion level — Playwright's expect.toPass retries a flaky assertion block; Zalando adds these during the shadow-mode iteration.
  • Selector robustnessdata-testid > CSS classes; role-based selectors > DOM structure.

Shadow mode is mandatory

Before a probe can page, it must be validated in email-only mode for long enough to drive false-positive rate to zero. See patterns/shadow-mode-alert-before-paging.

Scope creep is the failure mode

Adding scenarios to a working probe tier reintroduces the flakiness it was designed to escape. The discipline is to raise probe scope only after the new scenario's reliability is independently validated in shadow mode.

Relationship to CBO catalog

Each probe scenario is effectively a CBO at browser altitude. Zalando's publication-time probe scope (home → gender → product, catalog → filter → product, product → size → cart → checkout) maps 1-to-1 to CBOs in the CBO catalog. The declared growth path is "include more of our CBOs" — the probe tier's natural north star is full CBO coverage, capped only by reliability budget.

Dependencies

  • Browser automation framework with strong async / flakiness handlingPlaywright is Zalando's choice, naming auto-wait, auto-retry, tracing / time-travel debug, unified browser API, and TypeScript out of the box.
  • Cron scheduler — Kubernetes CronJob or equivalent.
  • Alert routing with email-only severity tier for the shadow-mode gate.
  • HTML report / trace / video capture as the iteration signal for shadow-mode failures.
  • Paging integration with the existing on-call system.

Seen in

  • sources/2024-07-18-zalando-end-to-end-test-probes-with-playwrightcanonical wiki instance. Zalando's probe tier complements its ~95 %-reliable Cypress CI pipeline (≈120 builds/day) with three Playwright probe scenarios on a 30-minute cron, promoted to paging after a multi-week email-only shadow-mode validation period that drove false-positive rate to 0 %. Post- promotion reality: "So far they have only paged us once, and that was during an incident where the page was actually not working." Motivated by a 2024 React-hydration regression that the existing CBO-based symptom-alerting stack did not fire on.
Last updated · 501 distilled / 1,218 read