PATTERN Cited by 1 source

Keep original partition as fallback during split¶

Definition¶

The discipline of never deleting the source partition after a successful split — leaving the original wide partition as a passive fallback that the read path can divert to when:

The split metadata is unavailable (e.g. metadata-table outage).
A split has been marked COMPLETED but eventual consistency has not yet propagated.
A bug is discovered in the split / divert pipeline and an operator wants to roll back to original-only reads.
A completed split is later invalidated (e.g. partition becomes mutable again — see concepts/immutable-partition).

The trade is storage cost for operational safety. The pattern is named explicitly in Netflix TimeSeries' 2026-06-03 dynamic-partition-splitting disclosure (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads).

Verbatim from the source¶

"The existing wide partition from the original time slice is never deleted. This helps us in creating safe fallbacks in many different scenarios of partial failures and eventual consistency. The slightly larger storage space we use as a result is worth the operational safety we gain."

The architectural property is load-bearing, not incidental:

The split table has the same schema as the original — both can be read by the same PartitionReader class.
The Bloom-filter gate on the read path is a divert mechanism, not a hard cutover — the original is still present and reachable.
The wide_row metadata table records both pre_split_data (original location) and post_split_data (split location) — by retaining both, the system can fall back at read time without losing track of the source.

Why fallback matters here¶

Three classes of failure that this pattern guards against:

1. Eventual-consistency window¶

The Bloom filter loads partition keys of completed splits periodically — there is a window where a split is COMPLETED but the Bloom filter on a particular server hasn't reloaded yet. During this window, reads on that server route to the original partition. The original must exist for those reads to return correct data.

Even after Bloom-filter loading, the metadata-table read is a read-through cache — there is also a brief window where the cache has not yet seen the metadata. Same protection: original must exist.

2. Partial-failure scenarios¶

The split pipeline writes to a separate time-slice table. That table's writes can be:

Partially written if the splitter crashes mid-write (recoverable from checkpoint, but during recovery reads should fall back).
Lost in a region failover (depending on Cassandra topology) — the split table may be lagging in a region.
Affected by replication-level mismatch (e.g. split table writes done at LOCAL_QUORUM, original at QUORUM).

In all these cases, falling back to the original partition is a strictly safer choice than failing the read.

3. Bug-tolerant rollback¶

If a bug is discovered in the splitter (e.g. a clustering-column ordering mismatch that produces post-split-checksum-equal-but-wrong-data, defeating the checksum gate), the operator can disable Bloom-filter loading and the system reverts to original-partition-only reads. Without the original retained, this rollback is impossible.

Composition with checksum validation¶

The fallback property and the pre/post checksum gate are complementary defences:

Defence	What it catches	When it fires
Pre/post checksum	Splitter logic bugs (lost / duplicated rows)	At splitting time, before split is COMPLETED
Original-as-fallback	Eventual-consistency windows, partial-failure, post-COMPLETED bugs	At read time, on every read
Spark offline verify	Subtle correctness bugs the checksum hash function might collide on	Hours / days after split
Shadow comparison	Read-path implementation bugs	During phased rollout per dataset

The four together produce defence-in-depth for what the post explicitly calls "disastrous" failure: serving incorrect reads.

Trade-offs¶

Pro	Con
Bug-tolerant rollback at any time	Permanent storage overhead — original never reclaimed
Eventual-consistency-tolerant reads	Storage overhead grows with split count
Same-schema split table → fallback uses existing reader code	Long-term, the fallback grows stale (the original is the pre-split shape)
Compatible with mutable-partition splits when those ship	Requires read path to know about both pre and post locations
Compatible with re-processing failed splits	Fallback is latency-correct but partition-shape-wide — slow reads return

When to retire the original¶

The post does not specify a retirement policy. Reasonable retirement-window candidates:

Time-bounded retention. If the original lives in a Time Slice with a retention TTL, it ages out naturally. Until then, fallback is available.
Confidence-window retention. Keep the original for N weeks after split COMPLETED, then nodetool repair-and-drop. The window allows operator to catch silent bugs.
Never retire. Storage is cheap; operational safety is not.

The Netflix TimeSeries shape implies the time-bounded retention path: each Time Slice has a retention TTL, so the original eventually ages out without explicit deletion.

Sibling patterns¶

patterns/shadow-table-online-schema-change — same shape applied to schema migration: the new and old shapes both exist and the read path can fall back.
Blue/green deployment — the deprecated colour is retained for fast-rollback during a migration window.
patterns/shadow-then-reverse-shadow-migration — the source is retained as the canonical store while the target is shadowed and validated.
CRDT tombstones (concepts/tombstone) — same shape applied at the row level: the deletion marker is retained even after compaction GC.

The shared discipline: the cheapest correctness mechanism in distributed systems is to not throw away the source until you're certain you don't need it.

Seen in¶

sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads — Canonical wiki home. Netflix TimeSeries Abstraction's dynamic-partition-splitting pipeline retains the original wide partition indefinitely after a successful split. "The slightly larger storage space we use as a result is worth the operational safety we gain." Composes with pre/post checksum, shadow byte comparison, and offline Spark verification as defence-in-depth.

concepts/dynamic-partition-splitting — the broader concept this pattern is part of.
concepts/wide-partition-problem — the failure being remediated.
concepts/checksum-validated-data-migration — the complementary correctness gate.
patterns/dynamic-partition-split-async-pipeline — the pipeline this pattern lives inside.
patterns/bloom-filter-redirect-to-split-partition — the read-path divert that uses the original as fallback on miss.
patterns/shadow-mode-bytes-comparison — sibling correctness gate during rollout.
systems/netflix-timeseries-abstraction — the canonical instance.
systems/apache-cassandra — the substrate.