Skip to content

PATTERN Cited by 1 source

Bloom-filter redirect to split partition

Definition

A Bloom-filter-gated read-path divert that decides on every read whether the partition has been split or not — and only on a Bloom-filter hit incurs the cost of a metadata-table lookup that tells the server where to read the split partition from. The pattern is a transparent routing layer that sits between the read API and the underlying PartitionReader: most reads (no split exists) skip the routing layer entirely; only reads on split partitions incur extra work.

Canonicalised on the wiki by Netflix's TimeSeries Abstraction in the 2026-06-03 dynamic-partition-splitting disclosure (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads).

Read path with the pattern

TimeSeries server receives read(ts_id, time_range)
Bloom-filter check on (ts_id, time_bucket, event_bucket)         ← single-digit microseconds
  ├── miss (probable) → PartitionReader against original table   ← fast path, free routing cost
  │                                                                (also catches false-misses safely
  │                                                                 because Bloom filter has no FN)
  └── hit (possible) → wide_row metadata read-through cache       ← fast on cache hit, O(disk) on miss
                       Metadata returns:
                         pre_split_data:  what to read
                         post_split_data: where to read it from
                                          + strategy block (target_event_buckets, start_event_bucket)
                       PartitionReader against split table
                         (same schema → same reader code)
                       Result returned

Bloom-filter check cost: single-digit microseconds or better"making this diversion practically invisible to the callers."

Why a Bloom filter (and not e.g. a metadata-table-only lookup)

Reads on non-split partitions vastly outnumber reads on split partitions. If the read path always consulted the wide_row metadata table, every read would pay the metadata-lookup cost — an extra round trip on the hot path of a system designed for millions of reads per second.

The Bloom filter inverts the cost:

  • No split exists → Bloom filter says definitely no → skip metadata lookup entirely. The vast majority of reads.
  • Split exists → Bloom filter says probably yes → consult metadata. Bounded by the (much smaller) count of split partitions.

Bloom filters never produce false negatives — when the filter says no, the read can confidently skip the metadata lookup. False positives happen, but they cost only the metadata lookup itself, which returns no entry and the read proceeds against the original.

Why an in-memory Bloom filter, loaded periodically

The partition-key set of completed splits changes only when new splits complete. The set:

  • Grows over time as more partitions are split.
  • Shrinks rarely (only on operational rollback, which re-loads from scratch).
  • Doesn't change on the read path — only on the splitter completion path.

This shape fits an in-memory snapshot loaded periodically much better than a live stream:

  • Reads see a consistent snapshot until next reload.
  • Reload cadence trades freshness against load on the metadata-source.
  • Memory footprint scales with split-count, not partition-count.

The post asserts: "due to the compactness of partition keys, and ratio of wide partitions in a given dataset, the filters fit comfortably in each server instance." Memory footprint is monitored as an operational concern but does not require rebalancing.

Composition with the metadata table

The Bloom filter answers "is this partition split?" — the metadata table answers "what's the split layout?" The pattern composes the two:

Layer Latency Purpose
Bloom filter microseconds Skip metadata lookup for non-split
Read-through cache (in front of metadata) microseconds on hit Skip disk lookup for already-resolved splits
Metadata table disk-O(read) on cache miss Authoritative split layout
PartitionReader against split table normal partition read Returns actual data

Each layer protects the next. The vast majority of reads stop at the Bloom-filter layer. Splits hit the cache. Cache misses hit the metadata table. The metadata table hits the actual split partitions. Every layer is fast on its happy path and the architecture is forward-compatible (more layers can be added without rewriting the read API).

Sibling: Bloom filter as pre-storage gate

This pattern is a sibling of patterns/bloom-filter-membership-test-before-storage-fetch (Vercel global routing) — same primitive (Bloom filter as cheap pre-filter to skip an authoritative-but-expensive lookup), different consumer:

Pattern What's gated Consequence on hit Consequence on miss
Bloom-filter pre-storage fetch (Vercel) Storage fetch Authoritative storage read Skip storage entirely
Bloom-filter redirect (this pattern) Metadata-table lookup Read split partition via metadata Read original partition (still authoritative)

The Vercel pattern's miss skips storage; the Netflix pattern's miss reads the original (which is correct — the partition wasn't split). Both leverage the no-FN guarantee of Bloom filters.

Trade-offs

Pro Con
Microsecond routing decision on the hot path Bloom filter must be loaded into every server's memory
Free for non-split reads (vast majority) Bloom filter freshness lag — newly-completed splits aren't routed until reload
Composes with read-through cache for split reads Extra reload coordination required
Memory footprint scales with split count, not data count False positives still incur metadata lookup
Operational-safety friendly: original-partition-fallback works on FN-impossible filter Bloom-filter sizing must be planned for max split count

Caveats not disclosed in source

  • False-positive rate, hash count, and bit-array size not specified.
  • Reload cadence and bandwidth cost not characterised.
  • What the Bloom filter is keyed on exactly — likely (time_series_id, time_bucket, event_bucket) from the post's read-divert flow, but not explicitly enumerated.

Seen in

  • sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloadsCanonical wiki home. Netflix TimeSeries Abstraction loads partition keys of COMPLETED splits into in-memory Bloom filters periodically; every read consults the filter; on hit, reads wide_row metadata table (read-through cached) for split layout and dispatches to existing PartitionReader against the split table; on miss / fallback, reads original. Bloom-filter check is "single-digit microseconds or better." Filter size monitored: "The filters fit comfortably in each server instance."
Last updated · 542 distilled / 1,571 read