PATTERN Cited by 1 source

Phased rollout of read mode¶

Definition¶

A read-path migration discipline that introduces multiple named read modes (e.g. OFF / SHADOW / COMPARISON / EXEC / ON), advances one dataset (or namespace) at a time through these modes, and only allows advancement when the previous mode passes its checks. Each mode is a defined configuration of which read path executes, which path is shadowed, and what comparison metrics gate the transition.

Distinct from generic feature-flag rollout in two ways:

Multiple intermediate modes, each with a specific validation purpose (the SHADOW mode validates correctness; COMPARISON sustains the validation; EXEC tests latency under real serving; etc.).
Gated advancement, where the metrics from the prior mode must be clean before the next mode is enabled.

Canonicalised on the wiki by Netflix's TimeSeries Abstraction in the 2026-06-03 dynamic-partition-splitting disclosure (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads).

The Netflix instantiation¶

"Implementing a phased rollout strategy to safely advance through stages as our confidence in the system grew."

The post explicitly highlights the Comparison phase as load-bearing — "a chart of bytes match vs bytes differ in a given shadow period" — as the gate that determines whether a dataset advances. The full mode progression is implicit in the architecture (OFF → SHADOW with byte comparison → EXEC where new path serves with old still as fallback → ON when fallback is no longer wired in), with each transition requiring sustained green metrics.

The rollout proceeds per dataset / per namespace rather than fleet-wide all at once — confidence is established on lower-risk datasets first, then propagated to higher-risk ones.

Why phase the rollout at all¶

The dynamic-partition-splitting feature has three properties that make a phased rollout structurally necessary:

High blast radius — incorrect reads on TimeSeries data could affect downstream Counter aggregations, multi-region replicated state, etc.
Per-dataset variability — different datasets have different access patterns, partition shapes, and failure modes; one might pass in shadow mode while another stresses an unhandled corner case.
Hard to test exhaustively offline — partition-splitting outcomes depend on production read patterns + Cassandra cluster state + replication topology.

Phased rollout converts the question "will this work in production?" from a single bet into a sequence of progressively-more-aggressive bets, each gated by metrics from the prior one.

Mode definitions¶

A typical instantiation:

Mode	What runs	What's compared	What advances
OFF	Old read path only	nothing	manual after testing
SHADOW	Both paths run; old returned to caller	bytes A vs bytes B	sustained match → COMPARISON
COMPARISON	Both paths run; old returned to caller	sustained match across full traffic profile	matches across analytics + peak + interactive → EXEC
EXEC	New path returned to caller; old retained as fallback	old-path also runs as fallback for failures	clean SLO + fallback-rate metrics → ON
ON	New path only	nothing	(terminal — fallback could be re-enabled if needed)

The post does not enumerate this exact set of modes by name (it only mentions Shadow / Comparison / Read modes), but the structural progression is implicit in the description.

Why per-dataset rather than fleet-wide¶

Each dataset has a different:

Workload profile (read-heavy, write-heavy, range-query-heavy).
Wide-partition rate (some datasets have many wide partitions, others have none).
Tolerance for incorrect reads (some downstreams aggregate, others audit).
Operational bandwidth (some teams have on-call coverage, others don't).

Per-dataset rollout lets the team:

Start with low-risk datasets (small reader population, clear correctness requirements).
Build confidence, and operational experience, dataset by dataset.
Roll back per-dataset on any anomaly, without affecting other datasets.

This is canonical phased migration with soak times applied at the namespace level.

Trade-offs¶

Pro	Con
Bug-tolerant: failures in one phase don't propagate fleet-wide	Slower fleet-wide deployment than feature-flag fleetwide-flip
Composable with byte comparison for correctness gating	Mode plumbing must be threaded through read API and config
Per-dataset cadence matches per-dataset risk profile	Operator overhead per advancement decision
Shadow / EXEC modes dual-run paths → operational cost during phases	Cost of dual-path execution during phases
Fallback-on-EXEC keeps safety even after cutover	More moving parts in production
Confidence builds across datasets	Earliest-rolled-out datasets get longer baking; latest get shorter

Sibling patterns¶

patterns/three-mode-rollout-off-shadow-exec — the canonical OFF/SHADOW/EXEC structure this pattern instantiates and extends with COMPARISON / ON.
patterns/shadow-mode-bytes-comparison — the byte-comparison gating that drives SHADOW → COMPARISON advancement.
patterns/canary-and-shadow-cluster-rollout — sibling rollout pattern with separate canary and shadow clusters.
patterns/phased-rollout-across-release-channels — sibling phased-rollout in a different domain (release channels).
patterns/event-type-by-event-type-shadow-cutover — sibling progressive cutover at the event-type altitude.
patterns/phased-mobile-rollout-with-stability-tiers — sibling at the mobile-app altitude.

When NOT to use¶

Pure config-only changes that can be flipped instantly with no correctness implications.
Datasets with no fallback path — phased rollout requires a working old path during the phase window.
Operations with low blast radius — the ceremony of mode plumbing isn't worth it for small-impact changes.

Seen in¶

sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads — Canonical wiki home. Netflix TimeSeries Abstraction's dynamic-partition-splitting rollout. "Implementing a phased rollout strategy to safely advance through stages as our confidence in the system grew." Per-dataset advancement gated by sustained byte-comparison match in the SHADOW / COMPARISON phases. The pattern composes with original-partition-fallback for safety even in EXEC mode.

patterns/three-mode-rollout-off-shadow-exec — sibling rollout pattern.
patterns/shadow-mode-bytes-comparison — the gating mechanism this pattern uses.
patterns/dynamic-partition-split-async-pipeline — the pipeline this rollout discipline applies to.
patterns/phased-rollout-across-release-channels · patterns/phased-migration-with-soak-times — sibling phased rollouts at different altitudes.
concepts/dynamic-partition-splitting — the broader concept context.
systems/netflix-timeseries-abstraction — the canonical instance.