PATTERN Cited by 1 source
Shadow-mode bytes comparison¶
Definition¶
A read-path validation pattern that runs the old read path and the new read path in parallel during a phased rollout, compares the bytes returned by each, and only allows the rollout to advance when the comparison stays clean across a sustained shadow window. The comparison is at the byte level — "a chart of bytes match vs bytes differ in a given shadow period" — making it a stricter check than per-row comparison or per-result-set comparison.
Canonicalised on the wiki by Netflix's TimeSeries Abstraction in the 2026-06-03 dynamic-partition-splitting disclosure (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads) — applied to the read-path cutover from "read original partition" to "read split partition."
Why bytes match instead of rows match or results match¶
A new read path can produce semantically equivalent data in byte-different form when:
- Result-set ordering is non-deterministic (e.g. unsorted secondary index path).
- Column ordering shifts (different reader serialisation).
- Tombstone / TTL handling differs (one path filters, the other doesn't).
- Row-level metadata (write timestamps, tags) is reformatted.
In all of these cases, the data is correct but the bytes differ. The old and new paths must produce byte-identical responses for callers — anything else is a behavioural break for clients that did not opt into the new behaviour.
Byte-level comparison is therefore stricter than typical "shadow validation": it catches not just incorrect rows but also subtle reformatting bugs that would have downstream consequences for byte-sensitive callers (caching, hashing, dedup).
Where it sits in the rollout¶
Netflix's read-mode rollout has multiple phases — "a phased rollout strategy to safely advance through stages as our confidence in the system grew". The post highlights Comparison as the load-bearing phase:
Read mode: OFF ┐
↓ │
SHADOW │
(new path runs in │
parallel, compares │ Each phase passes checks
bytes, returns OLD │ before advancing
to caller) │
↓ │
COMPARISON │
(sustained shadow, │
bytes-match-or-die)│
↓ │
EXEC │
(new path serves │
traffic; old still │
a fallback) │
↓ │
ON ┘
This is canonical three-mode rollout (OFF / SHADOW / EXEC) with bytes-comparison gating advancement.
Mechanism¶
- Per dataset, configure read mode = SHADOW.
- On each read, the server invokes both:
- Old read path → bytes A
- New read path → bytes B
- Server returns bytes A to the caller. (User-facing behaviour unchanged.)
- Server compares A vs B and emits a metric: bytes_match / bytes_differ.
- After a sustained window with bytes_match == 100% (or under tolerance), the dataset advances to EXEC.
Why "bytes_match" not "rows_match" not "results_equivalent"¶
Byte equivalence is the strictest check. Anything weaker permits subtle reformatting bugs to slip through. The trade-off: byte equivalence may flag legitimate non-functional differences (e.g. canonical row ordering implemented differently in the new path) that have to be either ironed out or explicitly tolerated.
Netflix chose the byte standard for dynamic-partition-splitting because the split mechanism's correctness is paramount — "Serving incorrect reads would be disastrous."
Composition with other validation gates¶
The byte-comparison shadow phase is one of four defences-in-depth the post lists:
| Gate | When | Catches |
|---|---|---|
| Pre/post checksum | At splitting time | Splitter logic bugs, lost / duplicated rows |
| Original-as-fallback | At every read | Eventual-consistency, partial-failure, post-COMPLETED bugs |
| Shadow-mode bytes comparison (this pattern) | During rollout | Read-path implementation bugs, subtle reformatting |
| Spark offline verification | Hours after split | Hash-collision-only bugs missed by checksum |
The four gates together produce defence in depth. No single layer is the only check.
Trade-offs¶
| Pro | Con |
|---|---|
| Strictest possible read-path validation | Bytes-different can flag legitimate reformat differences requiring engineering attention |
| Catches subtle bugs the checksum gate doesn't | Doubles read-path cost during shadow window (both paths run) |
| Composes with phased rollout per dataset | Requires test-mode flag plumbing through the read API |
| User-facing behaviour unchanged during shadow | Shadow window must be long enough to cover all read patterns (analytics, batch, peak hour) |
| Bytes-match metric is monitorable / alertable | Bytes-comparison code is itself a piece of system to maintain |
Sibling patterns¶
- patterns/canary-and-shadow-cluster-rollout — dual-cluster shadow validation; this pattern is the result-comparison analogue at the read-path level.
- patterns/shadow-migration — shadow-write validation; this pattern is the read-side analogue.
- patterns/shadow-then-reverse-shadow-migration — sequential shadow phases for migration; this pattern is one altitude inside such a migration's read-path validation.
- patterns/event-type-by-event-type-shadow-cutover — sibling progressive cutover with shadow validation.
The shared principle: before cutting over to a new path, validate it against the old path on real production traffic without affecting users.
When NOT to use¶
- The new and old paths are intentionally different (e.g. new path adds new fields or filters out deprecated ones). Byte-comparison would always flag — must use per-field comparison.
- Bytes are non-deterministic (e.g. timestamps embedded in responses). Must normalise before comparing.
- Shadow cost is prohibitive (e.g. each read fans out 1000× downstream and shadow doubles all of it). Use sample-based shadow with statistical comparison.
- Latency-sensitive read paths — running both paths can blow the SLO during shadow window.
Seen in¶
- sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads — Canonical wiki home. Netflix TimeSeries Abstraction's dynamic-partition-splitting rollout used byte-level shadow comparison as the load-bearing gate before advancing to EXEC mode per dataset. "A chart of bytes match vs bytes differ in a given shadow period." Composed with checksum-validated migration, original-as-fallback, and offline Spark verification as defence-in-depth.
Related¶
- concepts/checksum-validated-data-migration — the static-data validation gate this composes with.
- concepts/dynamic-partition-splitting — the broader concept this validation sits inside.
- patterns/dynamic-partition-split-async-pipeline — the pipeline this gates the rollout of.
- patterns/phased-rollout-of-read-mode — the rollout discipline this gate enables.
- patterns/shadow-migration · patterns/shadow-then-reverse-shadow-migration · patterns/canary-and-shadow-cluster-rollout · patterns/three-mode-rollout-off-shadow-exec — sibling shadow-validation patterns.
- systems/netflix-timeseries-abstraction — the canonical instance.