Skip to content

PATTERN Cited by 1 source

Controlled rollout with traffic ramp-up

Problem

A new variant ships to 100% of users on day one. If the variant has a bug, everyone sees it. If multiple teams are coordinating a launch, they can't all flip 100% at the same instant without synchronisation. Rolling back is disruptive; previewing impact on a small cohort before full launch is the whole point of A/B testing but lives outside the experiment infrastructure.

Solution

Expose feature toggles + traffic ramp-up as first-class primitives on the experimentation platform. A team defines the variant, picks an initial exposure percentage, and the platform:

  1. Assigns the chosen fraction of users to the treatment variant via the existing randomization engine.
  2. Collects the same A/B metrics as a full test would.
  3. Lets the team ramp up (or ramp down) exposure on a schedule or on-demand, based on observed metrics.
  4. Lets multiple teams coordinate ramp schedules so a complex cross-team feature goes live at the same effective moment.

The pattern treats rollout control as an experiment whose sample size grows over time, not as a separate deploy process.

Zalando's applied case

Octopus (see systems/octopus-zalando-experimentation-platform) added traffic-ramp-up and feature-toggle primitives during its Walk phase to support two customer use cases (Source: sources/2021-01-11-zalando-experimentation-platform-at-zalando-part-1-evolution):

  • Some teams want to gradually increase traffic into the test so they don't accidentally show a buggy variant to many users.
  • Other teams are working on a complex cross-team feature release and want to release at the same time — the platform provides shared ramp-up scheduling.

Zalando cites Fowler's canonical "Feature Toggles" article as the framing.

Why the platform, not the application

Application-layer feature toggles (env var, config file, code branch) work for a small team's one-off launch, but they don't:

  • Share a randomization basis with running A/B tests (so toggle-exposed users drift from experiment cohorts).
  • Provide per-percentile observability.
  • Give leadership visibility into what's rolling out.
  • Coordinate across teams without ad-hoc Slack threads.

Platform-level toggles sit on top of the A/B engine, inherit all of its telemetry, and integrate with result analysis.

Relationship to other rollout patterns

When not to use

  • Security / correctness fixes — the need is to get the fix to 100% as fast as safe, not to measure user-behaviour tradeoffs. Ramp-up is about user-facing variants whose impact is uncertain; security fixes are not that.
  • Irreversibly persisted state — if the variant writes data under a new schema to the same tables as control, ramping and then reverting is not clean. See patterns/expand-migrate-contract.

Seen in

Last updated · 476 distilled / 1,218 read