Skip to content

PATTERN Cited by 1 source

Shadow-mode bytes comparison

Definition

A read-path validation pattern that runs the old read path and the new read path in parallel during a phased rollout, compares the bytes returned by each, and only allows the rollout to advance when the comparison stays clean across a sustained shadow window. The comparison is at the byte level — "a chart of bytes match vs bytes differ in a given shadow period" — making it a stricter check than per-row comparison or per-result-set comparison.

Canonicalised on the wiki by Netflix's TimeSeries Abstraction in the 2026-06-03 dynamic-partition-splitting disclosure (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads) — applied to the read-path cutover from "read original partition" to "read split partition."

Why bytes match instead of rows match or results match

A new read path can produce semantically equivalent data in byte-different form when:

  • Result-set ordering is non-deterministic (e.g. unsorted secondary index path).
  • Column ordering shifts (different reader serialisation).
  • Tombstone / TTL handling differs (one path filters, the other doesn't).
  • Row-level metadata (write timestamps, tags) is reformatted.

In all of these cases, the data is correct but the bytes differ. The old and new paths must produce byte-identical responses for callers — anything else is a behavioural break for clients that did not opt into the new behaviour.

Byte-level comparison is therefore stricter than typical "shadow validation": it catches not just incorrect rows but also subtle reformatting bugs that would have downstream consequences for byte-sensitive callers (caching, hashing, dedup).

Where it sits in the rollout

Netflix's read-mode rollout has multiple phases — "a phased rollout strategy to safely advance through stages as our confidence in the system grew". The post highlights Comparison as the load-bearing phase:

Read mode:           OFF                 ┐  
                       ↓                 │
                     SHADOW              │
                     (new path runs in   │
                      parallel, compares │  Each phase passes checks
                      bytes, returns OLD │  before advancing
                      to caller)         │
                       ↓                 │
                     COMPARISON          │
                     (sustained shadow,  │
                      bytes-match-or-die)│
                       ↓                 │
                     EXEC                │
                     (new path serves    │
                      traffic; old still │
                      a fallback)        │
                       ↓                 │
                     ON                  ┘

This is canonical three-mode rollout (OFF / SHADOW / EXEC) with bytes-comparison gating advancement.

Mechanism

  1. Per dataset, configure read mode = SHADOW.
  2. On each read, the server invokes both:
  3. Old read path → bytes A
  4. New read path → bytes B
  5. Server returns bytes A to the caller. (User-facing behaviour unchanged.)
  6. Server compares A vs B and emits a metric: bytes_match / bytes_differ.
  7. After a sustained window with bytes_match == 100% (or under tolerance), the dataset advances to EXEC.

Why "bytes_match" not "rows_match" not "results_equivalent"

Byte equivalence is the strictest check. Anything weaker permits subtle reformatting bugs to slip through. The trade-off: byte equivalence may flag legitimate non-functional differences (e.g. canonical row ordering implemented differently in the new path) that have to be either ironed out or explicitly tolerated.

Netflix chose the byte standard for dynamic-partition-splitting because the split mechanism's correctness is paramount — "Serving incorrect reads would be disastrous."

Composition with other validation gates

The byte-comparison shadow phase is one of four defences-in-depth the post lists:

Gate When Catches
Pre/post checksum At splitting time Splitter logic bugs, lost / duplicated rows
Original-as-fallback At every read Eventual-consistency, partial-failure, post-COMPLETED bugs
Shadow-mode bytes comparison (this pattern) During rollout Read-path implementation bugs, subtle reformatting
Spark offline verification Hours after split Hash-collision-only bugs missed by checksum

The four gates together produce defence in depth. No single layer is the only check.

Trade-offs

Pro Con
Strictest possible read-path validation Bytes-different can flag legitimate reformat differences requiring engineering attention
Catches subtle bugs the checksum gate doesn't Doubles read-path cost during shadow window (both paths run)
Composes with phased rollout per dataset Requires test-mode flag plumbing through the read API
User-facing behaviour unchanged during shadow Shadow window must be long enough to cover all read patterns (analytics, batch, peak hour)
Bytes-match metric is monitorable / alertable Bytes-comparison code is itself a piece of system to maintain

Sibling patterns

The shared principle: before cutting over to a new path, validate it against the old path on real production traffic without affecting users.

When NOT to use

  • The new and old paths are intentionally different (e.g. new path adds new fields or filters out deprecated ones). Byte-comparison would always flag — must use per-field comparison.
  • Bytes are non-deterministic (e.g. timestamps embedded in responses). Must normalise before comparing.
  • Shadow cost is prohibitive (e.g. each read fans out 1000× downstream and shadow doubles all of it). Use sample-based shadow with statistical comparison.
  • Latency-sensitive read paths — running both paths can blow the SLO during shadow window.

Seen in

Last updated · 542 distilled / 1,571 read