PATTERN Cited by 1 source
Phased rollout of read mode¶
Definition¶
A read-path migration discipline that introduces multiple named read modes (e.g. OFF / SHADOW / COMPARISON / EXEC / ON), advances one dataset (or namespace) at a time through these modes, and only allows advancement when the previous mode passes its checks. Each mode is a defined configuration of which read path executes, which path is shadowed, and what comparison metrics gate the transition.
Distinct from generic feature-flag rollout in two ways:
- Multiple intermediate modes, each with a specific validation purpose (the SHADOW mode validates correctness; COMPARISON sustains the validation; EXEC tests latency under real serving; etc.).
- Gated advancement, where the metrics from the prior mode must be clean before the next mode is enabled.
Canonicalised on the wiki by Netflix's TimeSeries Abstraction in the 2026-06-03 dynamic-partition-splitting disclosure (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads).
The Netflix instantiation¶
"Implementing a phased rollout strategy to safely advance through stages as our confidence in the system grew."
The post explicitly highlights the Comparison phase as load-bearing — "a chart of bytes match vs bytes differ in a given shadow period" — as the gate that determines whether a dataset advances. The full mode progression is implicit in the architecture (OFF → SHADOW with byte comparison → EXEC where new path serves with old still as fallback → ON when fallback is no longer wired in), with each transition requiring sustained green metrics.
The rollout proceeds per dataset / per namespace rather than fleet-wide all at once — confidence is established on lower-risk datasets first, then propagated to higher-risk ones.
Why phase the rollout at all¶
The dynamic-partition-splitting feature has three properties that make a phased rollout structurally necessary:
- High blast radius — incorrect reads on TimeSeries data could affect downstream Counter aggregations, multi-region replicated state, etc.
- Per-dataset variability — different datasets have different access patterns, partition shapes, and failure modes; one might pass in shadow mode while another stresses an unhandled corner case.
- Hard to test exhaustively offline — partition-splitting outcomes depend on production read patterns + Cassandra cluster state + replication topology.
Phased rollout converts the question "will this work in production?" from a single bet into a sequence of progressively-more-aggressive bets, each gated by metrics from the prior one.
Mode definitions¶
A typical instantiation:
| Mode | What runs | What's compared | What advances |
|---|---|---|---|
| OFF | Old read path only | nothing | manual after testing |
| SHADOW | Both paths run; old returned to caller | bytes A vs bytes B | sustained match → COMPARISON |
| COMPARISON | Both paths run; old returned to caller | sustained match across full traffic profile | matches across analytics + peak + interactive → EXEC |
| EXEC | New path returned to caller; old retained as fallback | old-path also runs as fallback for failures | clean SLO + fallback-rate metrics → ON |
| ON | New path only | nothing | (terminal — fallback could be re-enabled if needed) |
The post does not enumerate this exact set of modes by name (it only mentions Shadow / Comparison / Read modes), but the structural progression is implicit in the description.
Why per-dataset rather than fleet-wide¶
Each dataset has a different:
- Workload profile (read-heavy, write-heavy, range-query-heavy).
- Wide-partition rate (some datasets have many wide partitions, others have none).
- Tolerance for incorrect reads (some downstreams aggregate, others audit).
- Operational bandwidth (some teams have on-call coverage, others don't).
Per-dataset rollout lets the team:
- Start with low-risk datasets (small reader population, clear correctness requirements).
- Build confidence, and operational experience, dataset by dataset.
- Roll back per-dataset on any anomaly, without affecting other datasets.
This is canonical phased migration with soak times applied at the namespace level.
Trade-offs¶
| Pro | Con |
|---|---|
| Bug-tolerant: failures in one phase don't propagate fleet-wide | Slower fleet-wide deployment than feature-flag fleetwide-flip |
| Composable with byte comparison for correctness gating | Mode plumbing must be threaded through read API and config |
| Per-dataset cadence matches per-dataset risk profile | Operator overhead per advancement decision |
| Shadow / EXEC modes dual-run paths → operational cost during phases | Cost of dual-path execution during phases |
| Fallback-on-EXEC keeps safety even after cutover | More moving parts in production |
| Confidence builds across datasets | Earliest-rolled-out datasets get longer baking; latest get shorter |
Sibling patterns¶
- patterns/three-mode-rollout-off-shadow-exec — the canonical OFF/SHADOW/EXEC structure this pattern instantiates and extends with COMPARISON / ON.
- patterns/shadow-mode-bytes-comparison — the byte-comparison gating that drives SHADOW → COMPARISON advancement.
- patterns/canary-and-shadow-cluster-rollout — sibling rollout pattern with separate canary and shadow clusters.
- patterns/phased-rollout-across-release-channels — sibling phased-rollout in a different domain (release channels).
- patterns/event-type-by-event-type-shadow-cutover — sibling progressive cutover at the event-type altitude.
- patterns/phased-mobile-rollout-with-stability-tiers — sibling at the mobile-app altitude.
When NOT to use¶
- Pure config-only changes that can be flipped instantly with no correctness implications.
- Datasets with no fallback path — phased rollout requires a working old path during the phase window.
- Operations with low blast radius — the ceremony of mode plumbing isn't worth it for small-impact changes.
Seen in¶
- sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads — Canonical wiki home. Netflix TimeSeries Abstraction's dynamic-partition-splitting rollout. "Implementing a phased rollout strategy to safely advance through stages as our confidence in the system grew." Per-dataset advancement gated by sustained byte-comparison match in the SHADOW / COMPARISON phases. The pattern composes with original-partition-fallback for safety even in EXEC mode.
Related¶
- patterns/three-mode-rollout-off-shadow-exec — sibling rollout pattern.
- patterns/shadow-mode-bytes-comparison — the gating mechanism this pattern uses.
- patterns/dynamic-partition-split-async-pipeline — the pipeline this rollout discipline applies to.
- patterns/phased-rollout-across-release-channels · patterns/phased-migration-with-soak-times — sibling phased rollouts at different altitudes.
- concepts/dynamic-partition-splitting — the broader concept context.
- systems/netflix-timeseries-abstraction — the canonical instance.