PATTERN Cited by 1 source

Shadow-mode bytes comparison¶

Definition¶

A read-path validation pattern that runs the old read path and the new read path in parallel during a phased rollout, compares the bytes returned by each, and only allows the rollout to advance when the comparison stays clean across a sustained shadow window. The comparison is at the byte level — "a chart of bytes match vs bytes differ in a given shadow period" — making it a stricter check than per-row comparison or per-result-set comparison.

Canonicalised on the wiki by Netflix's TimeSeries Abstraction in the 2026-06-03 dynamic-partition-splitting disclosure (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads) — applied to the read-path cutover from "read original partition" to "read split partition."

Why bytes match instead of rows match or results match¶

A new read path can produce semantically equivalent data in byte-different form when:

Result-set ordering is non-deterministic (e.g. unsorted secondary index path).
Column ordering shifts (different reader serialisation).
Tombstone / TTL handling differs (one path filters, the other doesn't).
Row-level metadata (write timestamps, tags) is reformatted.

In all of these cases, the data is correct but the bytes differ. The old and new paths must produce byte-identical responses for callers — anything else is a behavioural break for clients that did not opt into the new behaviour.

Byte-level comparison is therefore stricter than typical "shadow validation": it catches not just incorrect rows but also subtle reformatting bugs that would have downstream consequences for byte-sensitive callers (caching, hashing, dedup).

Where it sits in the rollout¶

Netflix's read-mode rollout has multiple phases — "a phased rollout strategy to safely advance through stages as our confidence in the system grew". The post highlights Comparison as the load-bearing phase:

Read mode:           OFF                 ┐  
                       ↓                 │
                     SHADOW              │
                     (new path runs in   │
                      parallel, compares │  Each phase passes checks
                      bytes, returns OLD │  before advancing
                      to caller)         │
                       ↓                 │
                     COMPARISON          │
                     (sustained shadow,  │
                      bytes-match-or-die)│
                       ↓                 │
                     EXEC                │
                     (new path serves    │
                      traffic; old still │
                      a fallback)        │
                       ↓                 │
                     ON                  ┘

This is canonical three-mode rollout (OFF / SHADOW / EXEC) with bytes-comparison gating advancement.

Mechanism¶

Per dataset, configure read mode = SHADOW.
On each read, the server invokes both:
Old read path → bytes A
New read path → bytes B
Server returns bytes A to the caller. (User-facing behaviour unchanged.)
Server compares A vs B and emits a metric: bytes_match / bytes_differ.
After a sustained window with bytes_match == 100% (or under tolerance), the dataset advances to EXEC.

Why "bytes_match" not "rows_match" not "results_equivalent"¶

Byte equivalence is the strictest check. Anything weaker permits subtle reformatting bugs to slip through. The trade-off: byte equivalence may flag legitimate non-functional differences (e.g. canonical row ordering implemented differently in the new path) that have to be either ironed out or explicitly tolerated.

Netflix chose the byte standard for dynamic-partition-splitting because the split mechanism's correctness is paramount — "Serving incorrect reads would be disastrous."

Composition with other validation gates¶

The byte-comparison shadow phase is one of four defences-in-depth the post lists:

Gate	When	Catches
Pre/post checksum	At splitting time	Splitter logic bugs, lost / duplicated rows
Original-as-fallback	At every read	Eventual-consistency, partial-failure, post-COMPLETED bugs
Shadow-mode bytes comparison (this pattern)	During rollout	Read-path implementation bugs, subtle reformatting
Spark offline verification	Hours after split	Hash-collision-only bugs missed by checksum

The four gates together produce defence in depth. No single layer is the only check.

Trade-offs¶

Pro	Con
Strictest possible read-path validation	Bytes-different can flag legitimate reformat differences requiring engineering attention
Catches subtle bugs the checksum gate doesn't	Doubles read-path cost during shadow window (both paths run)
Composes with phased rollout per dataset	Requires test-mode flag plumbing through the read API
User-facing behaviour unchanged during shadow	Shadow window must be long enough to cover all read patterns (analytics, batch, peak hour)
Bytes-match metric is monitorable / alertable	Bytes-comparison code is itself a piece of system to maintain

Sibling patterns¶

patterns/canary-and-shadow-cluster-rollout — dual-cluster shadow validation; this pattern is the result-comparison analogue at the read-path level.
patterns/shadow-migration — shadow-write validation; this pattern is the read-side analogue.
patterns/shadow-then-reverse-shadow-migration — sequential shadow phases for migration; this pattern is one altitude inside such a migration's read-path validation.
patterns/event-type-by-event-type-shadow-cutover — sibling progressive cutover with shadow validation.

The shared principle: before cutting over to a new path, validate it against the old path on real production traffic without affecting users.

When NOT to use¶

The new and old paths are intentionally different (e.g. new path adds new fields or filters out deprecated ones). Byte-comparison would always flag — must use per-field comparison.
Bytes are non-deterministic (e.g. timestamps embedded in responses). Must normalise before comparing.
Shadow cost is prohibitive (e.g. each read fans out 1000× downstream and shadow doubles all of it). Use sample-based shadow with statistical comparison.
Latency-sensitive read paths — running both paths can blow the SLO during shadow window.

Seen in¶

sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads — Canonical wiki home. Netflix TimeSeries Abstraction's dynamic-partition-splitting rollout used byte-level shadow comparison as the load-bearing gate before advancing to EXEC mode per dataset. "A chart of bytes match vs bytes differ in a given shadow period." Composed with checksum-validated migration, original-as-fallback, and offline Spark verification as defence-in-depth.

concepts/checksum-validated-data-migration — the static-data validation gate this composes with.
concepts/dynamic-partition-splitting — the broader concept this validation sits inside.
patterns/dynamic-partition-split-async-pipeline — the pipeline this gates the rollout of.
patterns/phased-rollout-of-read-mode — the rollout discipline this gate enables.
patterns/shadow-migration · patterns/shadow-then-reverse-shadow-migration · patterns/canary-and-shadow-cluster-rollout · patterns/three-mode-rollout-off-shadow-exec — sibling shadow-validation patterns.
systems/netflix-timeseries-abstraction — the canonical instance.