PATTERN Cited by 1 source
A/B-Test Rollout with Percentile Guardrails¶
When shipping a performance-sensitive change whose impact is noisy across client environments, run it as a controlled A/B experiment rather than a fixed-timeline rollout, and hold the rollout until per-percentile metrics (p50, p90, p99) move in the expected direction on both topline and guardrail dimensions. Complements but does not replace patterns/staged-rollout — staged rollout is about blast radius; A/B rollout is about attributing cause to the change and catching regressions the topline hides.
Mechanics¶
- Instrument many browser-side metrics (Atlassian captures a broad set: FCP, TTVC/VC90, TTI, hydration success rate, etc.) — enough to catch secondary regressions.
- Split traffic into treatment and control. Track each metric at p50 / p90 / p99; tails often move in the opposite direction to medians under subtle regressions.
- Riskier changes get slower rollouts ("several weeks" for Confluence's streaming SSR). Complexity of change, not size of expected win, dictates pace.
- Hold release until statistical significance across the chosen percentiles; only then expand.
Why topline-only rollouts fail¶
The Confluence team almost shipped streaming SSR with a latent TTI
regression caused by the React-18 context/Suspense re-render bug (see
concepts/react-hydration) — topline FCP was up ~40%, the
headline win, but TTI at p90 was degrading. Guardrail metrics +
per-percentile tracking surfaced the regression before GA; they fixed
it with unstable_scheduleHydration and continued the rollout.
(Source: sources/2026-04-16-atlassian-streaming-ssr-confluence)
Lesson: performance A/B tests need more metrics than you think. Some of them exist specifically to catch you ruining something else while winning the advertised metric.
When to use¶
- Large cross-cutting changes (protocol/runtime swap, rendering-mode change, compression-algorithm switch) with diffuse client-side impact.
- Changes where the expected-win metric is partly correlated with metrics you don't want to regress.
- Any "riskier" change by the team's own judgement — complexity is the signal, not projected impact.
Relation to other rollout patterns¶
- patterns/staged-rollout — environment/zone/pod-%/user-% stages, each with its own evaluation. Used alongside A/B; e.g., staged by region, A/B within each stage.
- patterns/fast-rollback — still required; A/B test tells you when to roll back, the rollback mechanism lets you do it quickly.
Seen in¶
- sources/2026-04-16-atlassian-streaming-ssr-confluence — multi-week A/B of streaming SSR with FCP / TTVC / TTI / hydration success rate at p50/p90/p99.