PATTERN Cited by 1 source
Bloom-filter redirect to split partition¶
Definition¶
A Bloom-filter-gated read-path divert that decides on every read whether the partition has been split or not — and only on a Bloom-filter hit incurs the cost of a metadata-table lookup that tells the server where to read the split partition from. The pattern is a transparent routing layer that sits between the read API and the underlying PartitionReader: most reads (no split exists) skip the routing layer entirely; only reads on split partitions incur extra work.
Canonicalised on the wiki by Netflix's TimeSeries Abstraction in the 2026-06-03 dynamic-partition-splitting disclosure (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads).
Read path with the pattern¶
TimeSeries server receives read(ts_id, time_range)
│
▼
Bloom-filter check on (ts_id, time_bucket, event_bucket) ← single-digit microseconds
│
├── miss (probable) → PartitionReader against original table ← fast path, free routing cost
│ (also catches false-misses safely
│ because Bloom filter has no FN)
│
└── hit (possible) → wide_row metadata read-through cache ← fast on cache hit, O(disk) on miss
│
▼
Metadata returns:
pre_split_data: what to read
post_split_data: where to read it from
+ strategy block (target_event_buckets, start_event_bucket)
│
▼
PartitionReader against split table
(same schema → same reader code)
│
▼
Result returned
Bloom-filter check cost: single-digit microseconds or better — "making this diversion practically invisible to the callers."
Why a Bloom filter (and not e.g. a metadata-table-only lookup)¶
Reads on non-split partitions vastly outnumber reads on split partitions. If the read path always consulted the wide_row metadata table, every read would pay the metadata-lookup cost — an extra round trip on the hot path of a system designed for millions of reads per second.
The Bloom filter inverts the cost:
- No split exists → Bloom filter says definitely no → skip metadata lookup entirely. The vast majority of reads.
- Split exists → Bloom filter says probably yes → consult metadata. Bounded by the (much smaller) count of split partitions.
Bloom filters never produce false negatives — when the filter says no, the read can confidently skip the metadata lookup. False positives happen, but they cost only the metadata lookup itself, which returns no entry and the read proceeds against the original.
Why an in-memory Bloom filter, loaded periodically¶
The partition-key set of completed splits changes only when new splits complete. The set:
- Grows over time as more partitions are split.
- Shrinks rarely (only on operational rollback, which re-loads from scratch).
- Doesn't change on the read path — only on the splitter completion path.
This shape fits an in-memory snapshot loaded periodically much better than a live stream:
- Reads see a consistent snapshot until next reload.
- Reload cadence trades freshness against load on the metadata-source.
- Memory footprint scales with split-count, not partition-count.
The post asserts: "due to the compactness of partition keys, and ratio of wide partitions in a given dataset, the filters fit comfortably in each server instance." Memory footprint is monitored as an operational concern but does not require rebalancing.
Composition with the metadata table¶
The Bloom filter answers "is this partition split?" — the metadata table answers "what's the split layout?" The pattern composes the two:
| Layer | Latency | Purpose |
|---|---|---|
| Bloom filter | microseconds | Skip metadata lookup for non-split |
| Read-through cache (in front of metadata) | microseconds on hit | Skip disk lookup for already-resolved splits |
| Metadata table | disk-O(read) on cache miss | Authoritative split layout |
PartitionReader against split table |
normal partition read | Returns actual data |
Each layer protects the next. The vast majority of reads stop at the Bloom-filter layer. Splits hit the cache. Cache misses hit the metadata table. The metadata table hits the actual split partitions. Every layer is fast on its happy path and the architecture is forward-compatible (more layers can be added without rewriting the read API).
Sibling: Bloom filter as pre-storage gate¶
This pattern is a sibling of patterns/bloom-filter-membership-test-before-storage-fetch (Vercel global routing) — same primitive (Bloom filter as cheap pre-filter to skip an authoritative-but-expensive lookup), different consumer:
| Pattern | What's gated | Consequence on hit | Consequence on miss |
|---|---|---|---|
| Bloom-filter pre-storage fetch (Vercel) | Storage fetch | Authoritative storage read | Skip storage entirely |
| Bloom-filter redirect (this pattern) | Metadata-table lookup | Read split partition via metadata | Read original partition (still authoritative) |
The Vercel pattern's miss skips storage; the Netflix pattern's miss reads the original (which is correct — the partition wasn't split). Both leverage the no-FN guarantee of Bloom filters.
Trade-offs¶
| Pro | Con |
|---|---|
| Microsecond routing decision on the hot path | Bloom filter must be loaded into every server's memory |
| Free for non-split reads (vast majority) | Bloom filter freshness lag — newly-completed splits aren't routed until reload |
| Composes with read-through cache for split reads | Extra reload coordination required |
| Memory footprint scales with split count, not data count | False positives still incur metadata lookup |
| Operational-safety friendly: original-partition-fallback works on FN-impossible filter | Bloom-filter sizing must be planned for max split count |
Caveats not disclosed in source¶
- False-positive rate, hash count, and bit-array size not specified.
- Reload cadence and bandwidth cost not characterised.
- What the Bloom filter is keyed on exactly — likely
(time_series_id, time_bucket, event_bucket)from the post's read-divert flow, but not explicitly enumerated.
Seen in¶
- sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads —
Canonical wiki home. Netflix TimeSeries Abstraction loads partition keys of
COMPLETED splits into in-memory Bloom filters periodically; every read consults the
filter; on hit, reads
wide_rowmetadata table (read-through cached) for split layout and dispatches to existingPartitionReaderagainst the split table; on miss / fallback, reads original. Bloom-filter check is "single-digit microseconds or better." Filter size monitored: "The filters fit comfortably in each server instance."
Related¶
- concepts/bloom-filter — the primitive.
- concepts/dynamic-partition-splitting — the broader concept this pattern is the read-path piece of.
- concepts/wide-partition-problem — the failure mode being remediated.
- patterns/bloom-filter-membership-test-before-storage-fetch — sibling Bloom-filter routing pattern (Vercel).
- patterns/dynamic-partition-split-async-pipeline — the pipeline this pattern is the read-path piece of.
- patterns/keep-original-partition-as-fallback-during-split — the operational-safety pattern this composes with.
- systems/netflix-timeseries-abstraction — the canonical instance.