PATTERN Cited by 1 source
Keep original partition as fallback during split¶
Definition¶
The discipline of never deleting the source partition after a successful split — leaving the original wide partition as a passive fallback that the read path can divert to when:
- The split metadata is unavailable (e.g. metadata-table outage).
- A split has been marked COMPLETED but eventual consistency has not yet propagated.
- A bug is discovered in the split / divert pipeline and an operator wants to roll back to original-only reads.
- A completed split is later invalidated (e.g. partition becomes mutable again — see concepts/immutable-partition).
The trade is storage cost for operational safety. The pattern is named explicitly in Netflix TimeSeries' 2026-06-03 dynamic-partition-splitting disclosure (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads).
Verbatim from the source¶
"The existing wide partition from the original time slice is never deleted. This helps us in creating safe fallbacks in many different scenarios of partial failures and eventual consistency. The slightly larger storage space we use as a result is worth the operational safety we gain."
The architectural property is load-bearing, not incidental:
- The split table has the same schema as the original — both can be read by the same
PartitionReaderclass. - The Bloom-filter gate on the read path is a divert mechanism, not a hard cutover — the original is still present and reachable.
- The
wide_rowmetadata table records bothpre_split_data(original location) andpost_split_data(split location) — by retaining both, the system can fall back at read time without losing track of the source.
Why fallback matters here¶
Three classes of failure that this pattern guards against:
1. Eventual-consistency window¶
The Bloom filter loads partition keys of completed splits periodically — there is a window where a split is COMPLETED but the Bloom filter on a particular server hasn't reloaded yet. During this window, reads on that server route to the original partition. The original must exist for those reads to return correct data.
Even after Bloom-filter loading, the metadata-table read is a read-through cache — there is also a brief window where the cache has not yet seen the metadata. Same protection: original must exist.
2. Partial-failure scenarios¶
The split pipeline writes to a separate time-slice table. That table's writes can be:
- Partially written if the splitter crashes mid-write (recoverable from checkpoint, but during recovery reads should fall back).
- Lost in a region failover (depending on Cassandra topology) — the split table may be lagging in a region.
- Affected by replication-level mismatch (e.g. split table writes done at LOCAL_QUORUM, original at QUORUM).
In all these cases, falling back to the original partition is a strictly safer choice than failing the read.
3. Bug-tolerant rollback¶
If a bug is discovered in the splitter (e.g. a clustering-column ordering mismatch that produces post-split-checksum-equal-but-wrong-data, defeating the checksum gate), the operator can disable Bloom-filter loading and the system reverts to original-partition-only reads. Without the original retained, this rollback is impossible.
Composition with checksum validation¶
The fallback property and the pre/post checksum gate are complementary defences:
| Defence | What it catches | When it fires |
|---|---|---|
| Pre/post checksum | Splitter logic bugs (lost / duplicated rows) | At splitting time, before split is COMPLETED |
| Original-as-fallback | Eventual-consistency windows, partial-failure, post-COMPLETED bugs | At read time, on every read |
| Spark offline verify | Subtle correctness bugs the checksum hash function might collide on | Hours / days after split |
| Shadow comparison | Read-path implementation bugs | During phased rollout per dataset |
The four together produce defence-in-depth for what the post explicitly calls "disastrous" failure: serving incorrect reads.
Trade-offs¶
| Pro | Con |
|---|---|
| Bug-tolerant rollback at any time | Permanent storage overhead — original never reclaimed |
| Eventual-consistency-tolerant reads | Storage overhead grows with split count |
| Same-schema split table → fallback uses existing reader code | Long-term, the fallback grows stale (the original is the pre-split shape) |
| Compatible with mutable-partition splits when those ship | Requires read path to know about both pre and post locations |
| Compatible with re-processing failed splits | Fallback is latency-correct but partition-shape-wide — slow reads return |
When to retire the original¶
The post does not specify a retirement policy. Reasonable retirement-window candidates:
- Time-bounded retention. If the original lives in a Time Slice with a retention TTL, it ages out naturally. Until then, fallback is available.
- Confidence-window retention. Keep the original for N weeks after split COMPLETED, then
nodetoolrepair-and-drop. The window allows operator to catch silent bugs. - Never retire. Storage is cheap; operational safety is not.
The Netflix TimeSeries shape implies the time-bounded retention path: each Time Slice has a retention TTL, so the original eventually ages out without explicit deletion.
Sibling patterns¶
- patterns/shadow-table-online-schema-change — same shape applied to schema migration: the new and old shapes both exist and the read path can fall back.
- Blue/green deployment — the deprecated colour is retained for fast-rollback during a migration window.
- patterns/shadow-then-reverse-shadow-migration — the source is retained as the canonical store while the target is shadowed and validated.
- CRDT tombstones (concepts/tombstone) — same shape applied at the row level: the deletion marker is retained even after compaction GC.
The shared discipline: the cheapest correctness mechanism in distributed systems is to not throw away the source until you're certain you don't need it.
Seen in¶
- sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads — Canonical wiki home. Netflix TimeSeries Abstraction's dynamic-partition-splitting pipeline retains the original wide partition indefinitely after a successful split. "The slightly larger storage space we use as a result is worth the operational safety we gain." Composes with pre/post checksum, shadow byte comparison, and offline Spark verification as defence-in-depth.
Related¶
- concepts/dynamic-partition-splitting — the broader concept this pattern is part of.
- concepts/wide-partition-problem — the failure being remediated.
- concepts/checksum-validated-data-migration — the complementary correctness gate.
- patterns/dynamic-partition-split-async-pipeline — the pipeline this pattern lives inside.
- patterns/bloom-filter-redirect-to-split-partition — the read-path divert that uses the original as fallback on miss.
- patterns/shadow-mode-bytes-comparison — sibling correctness gate during rollout.
- systems/netflix-timeseries-abstraction — the canonical instance.
- systems/apache-cassandra — the substrate.