PATTERN Cited by 1 source
Phased mobile rollout with stability tiers¶
Pattern¶
For a production mobile rollout of a change that can't be gated by a remote feature flag — a new framework architecture, a new native SDK, a compiled-in binary change — use a platform-asymmetric staged rollout with pre-committed stability-threshold response tiers. The store-level rollout percentage is the only control surface; the response tiers are the decision framework for how to react to what you see.
The canonical wiki instance is Shopify's 2025 production rollout of the React Native new-architecture migration for Shopify Mobile + Shopify POS (see sources/2025-09-12-shopify-migrating-to-react-natives-new-architecture).
Forces¶
- Native-binary change — no remote feature flag can gate it. Rollback means shipping a new binary.
- Asymmetric platform control. Google Play supports fine-grained % and full halt; App Store Connect has a pre-determined schedule and can only pause (users can still manually update). Hotfix approval on iOS takes ~24 hours. (See concepts/platform-asymmetric-rollout-control.)
- Weekly release cadence — fix-forward on the next release is a viable response for small regressions.
- High stability baseline — Shopify's target is 99.95% crash-free sessions; sub-99% is a critical incident.
Solution shape¶
1. Platform-asymmetric schedule¶
Run the rollback-friendly platform ahead of the rollback- unfriendly one. Shopify's Day 1 / Day 2 / Day 3 schedule:
| Day | Android (Play) | iOS (App Store) |
|---|---|---|
| 1 | 8% | 0% |
| 2 | 30% | 1% |
| 3 | 100% | 100% |
Android runs first because Google Play's fine-grained halt gives you a clean stop if things break. iOS stays minimal (1% on day 2) while signal is still coming in from Android, because iOS can't halt as cleanly. Both jump to 100% on day 3 once confidence is established — iOS in particular jumps because intermediate percentages aren't meaningfully controllable.
2. Pre-committed stability tiers¶
Before rollout, define named stability thresholds, each paired with a named response action. Shopify's three tiers (99.95% crash-free-session baseline):
| Threshold | Response |
|---|---|
| > 99.80% | fix forward on next weekly release |
| 99.00–99.80% or broken critical flow w/ known fix | pause rollout + hotfix |
| < 99.00% or broken critical flow w/o quick fix | rollback (last resort) |
(See concepts/stability-threshold-rollout-tiers.)
3. Rollback as last resort¶
Shopify explicitly treats rollback as the last option — not for stability reasons but because losing at-scale production signal sets the migration back significantly. A native-binary migration needs production data to find the bugs only production scale surfaces; rolling back forfeits that signal and forces another full submission/review cycle to try again. Pause-and-hotfix preserves the forward progress.
4. Weekly-release calendar as the fix-forward vehicle¶
Small regressions (above the pause-threshold) are left to the next scheduled release. The weekly release cadence makes this a reasonable response — you won't be sitting on a known bug for a month. For apps without a weekly release cadence, the fix-forward threshold has to be stricter, because the fix-forward delay is longer.
Costs¶
- Rollout days are expensive to staff. Shopify staffed on the rollout itself across the rollout window.
- Limited iOS granularity. The 1% → 100% jump is forced by the App Store, not chosen. Higher-blast-radius platforms have coarser recovery.
- Requires accurate, real-time stability measurement. Crash-free-sessions measurement must be fresh enough (minutes, not hours) to drive the tier-based decisions.
What Shopify saw in production (despite thorough pre-rollout¶
testing)
- Session stability "dropped from our 99.95% target initially, recovering after a couple of weeks of bug fixing."
- Load-time regressions up to 20% on some complex components.
- ANR spikes on both platforms — some from custom Reanimated/RN patches, some from main-thread native-module init deadlocks.
None of these were fatal to the rollout, because the tier framework correctly classified them as pause-and-hotfix situations, not rollback situations.
Related¶
- systems/app-store-connect — iOS rollout surface.
- systems/google-play-console — Android rollout surface.
- concepts/platform-asymmetric-rollout-control — the asymmetry this pattern explicitly engineers around.
- concepts/stability-threshold-rollout-tiers — the decision framework this pattern uses for response actions.
- patterns/staged-rollout — the general-purpose staged-rollout pattern this specialises.
- companies/shopify — origin of the canonical instance.