Skip to content

CONCEPT Cited by 1 source

Sample ratio mismatch (SRM)

Definition

Sample ratio mismatch (SRM) is a data-quality failure in A/B testing where the actual ratio of users assigned to treatment vs control differs significantly from the intended (design-time) ratio. Example: an experiment configured 50/50 but the collected telemetry shows 52.3/47.7 with a chi-squared p < 0.001.

SRM is the single most important data-quality indicator for A/B-test trustworthiness: if the groups are not actually balanced (or the data about which group a user landed in is corrupted), then any downstream metric comparison is invalid, regardless of how significant it looks. The literature canonical reference is Fabijan et al., KDD 2019.

Why SRM happens

Almost always a data-pipeline or instrumentation bug, not a randomization bug:

  • Inconsistent tracking-event schemas — product teams self-define events that get ingested differently, causing systematic loss of one variant's events.
  • Differential filtering — a downstream filter drops events from one variant more often (e.g. treatment triggers a client-side error that suppresses its telemetry).
  • Redirect / load-time loss — treatment involves a page redirect or heavier payload; users abandon before the assignment event lands.
  • Bot traffic — a crawler hits one variant disproportionately.
  • Consent asymmetry — under GDPR / similar, consent prompts behave differently per variant, causing selection bias in the tracked population.

Zalando's finding: 20% > 6–10% industry

Zalando reports (sources/2021-01-11-zalando-experimentation-platform-at-zalando-part-1-evolution) that peer companies report 6–10% of A/B tests affected by SRM, but Zalando's own historical analysis showed at least 20% — roughly 2× industry rate. Root cause was data-tracking-schema inconsistency across teams that self-defined schemas. Fixing this required cross-team communication + org-level reorganisation, not a platform feature.

Octopus's remediation

Zalando's platform (see systems/octopus-zalando-experimentation-platform) automatically detects SRM and alerts the affected team on detection. Further data investigation is required before analysis results are shown in the platform's dashboard (see patterns/automated-srm-alert).

Zalando calls out that consent-gated tracking under GDPR is a related data-quality issue: processing data only for visitors who consented creates a selection bias between consented and non-consented populations that can interact with SRM detection. Research on selection bias for A/B tests under consent constraints is listed as active work in the post.

Implications for platform design

  1. SRM checks belong in the platform, not the team. A platform-level chi-squared test on the collected assignment events is a single p-value comparison; teams forget to run it themselves.
  2. SRM should block result-publication, not just warn. If SRM is detected, downstream metric comparisons must be marked invalid; otherwise analysts will skim the number and ship anyway.
  3. SRM-rate is a leading indicator of org-level data-quality health, not just per-experiment health. A rising SRM rate across experiments is the platform telling you your tracking pipeline has drift.

Seen in

Last updated · 476 distilled / 1,218 read