PATTERN Cited by 1 source
Controlled rollout with traffic ramp-up¶
Problem¶
A new variant ships to 100% of users on day one. If the variant has a bug, everyone sees it. If multiple teams are coordinating a launch, they can't all flip 100% at the same instant without synchronisation. Rolling back is disruptive; previewing impact on a small cohort before full launch is the whole point of A/B testing but lives outside the experiment infrastructure.
Solution¶
Expose feature toggles + traffic ramp-up as first-class primitives on the experimentation platform. A team defines the variant, picks an initial exposure percentage, and the platform:
- Assigns the chosen fraction of users to the treatment variant via the existing randomization engine.
- Collects the same A/B metrics as a full test would.
- Lets the team ramp up (or ramp down) exposure on a schedule or on-demand, based on observed metrics.
- Lets multiple teams coordinate ramp schedules so a complex cross-team feature goes live at the same effective moment.
The pattern treats rollout control as an experiment whose sample size grows over time, not as a separate deploy process.
Zalando's applied case¶
Octopus (see systems/octopus-zalando-experimentation-platform) added traffic-ramp-up and feature-toggle primitives during its Walk phase to support two customer use cases (Source: sources/2021-01-11-zalando-experimentation-platform-at-zalando-part-1-evolution):
- Some teams want to gradually increase traffic into the test so they don't accidentally show a buggy variant to many users.
- Other teams are working on a complex cross-team feature release and want to release at the same time — the platform provides shared ramp-up scheduling.
Zalando cites Fowler's canonical "Feature Toggles" article as the framing.
Why the platform, not the application¶
Application-layer feature toggles (env var, config file, code branch) work for a small team's one-off launch, but they don't:
- Share a randomization basis with running A/B tests (so toggle-exposed users drift from experiment cohorts).
- Provide per-percentile observability.
- Give leadership visibility into what's rolling out.
- Coordinate across teams without ad-hoc Slack threads.
Platform-level toggles sit on top of the A/B engine, inherit all of its telemetry, and integrate with result analysis.
Relationship to other rollout patterns¶
- patterns/staged-rollout / patterns/release-train-rollout-with-canary — infrastructure-level progressive rollouts for code artefacts. Complementary: code ships through stages; the feature exposed by that code ramps through cohorts.
- patterns/ab-test-rollout (Atlassian) — percentile-guardrail rollout via A/B. Same pattern class: uses the experiment infrastructure to gate rollout on metric movement.
- patterns/cohort-percentage-rollout — the plain cohort-% variant without the A/B-backed randomization.
When not to use¶
- Security / correctness fixes — the need is to get the fix to 100% as fast as safe, not to measure user-behaviour tradeoffs. Ramp-up is about user-facing variants whose impact is uncertain; security fixes are not that.
- Irreversibly persisted state — if the variant writes data under a new schema to the same tables as control, ramping and then reverting is not clean. See patterns/expand-migrate-contract.
Seen in¶
- sources/2021-01-11-zalando-experimentation-platform-at-zalando-part-1-evolution — Octopus adds traffic ramp-up + feature toggles during Walk phase for bug-safety + multi-team-coordination use cases