CONCEPT Cited by 1 source
Bloom index filter (GPU-native eligibility filter)¶
Definition¶
The Bloom index filter is the GPU-native eligibility-filtering primitive used inside Meta's SilverTorch retrieval substrate (Source: sources/2026-05-26-meta-silvertorch-index-as-model-a-new-retrieval-paradigm-for-recommendation-systems). Rather than running an inverted-index service to filter candidates by attributes (language / country / content policy), SilverTorch "replaces that with a Bloom index stored directly inside the model. Each item gets a compact signature when it is published, and at serving time the model can quickly check whether an item matches the request using simple bit operations."
It is the in-model variant of the standard Bloom filter — same false-positive-asymmetric set-membership semantics, but instantiated as a tensor inside the PyTorch retrieval graph rather than as a service-side data structure.
Why CPU-style inverted indices fail on GPUs¶
Inverted-index filtering — "efficient on CPUs but harder to run well on GPUs" — fights GPU hardware in a specific, structural way (Source: sources/2026-05-26-meta-silvertorch-index-as-model-a-new-retrieval-paradigm-for-recommendation-systems):
"Recommendation filtering often has to check many item attributes at once, such as language, location, or eligibility rules, and posting lists can also vary dramatically in length across attributes and queries, creating intra-warp load imbalance and warp divergence on GPUs. Threads assigned short lists become inactive early, while the warp remains occupied until the lanes processing the longest lists complete."
The mechanism is variable-length-data-per-thread → warp-divergence → wasted GPU cycles. Inverted-index posting lists are inherently variable-length, so any GPU port hits this ceiling.
Why Bloom filters are GPU-friendly¶
The Bloom-filter shape inverts the bad properties:
- Fixed-size signature per item — no variable-length posting lists.
- Bit operations — "This turns filtering into the kind of dense, parallel work GPUs are good at."
- No warp divergence — every thread does the same fixed amount of work.
False-positive asymmetry is the price: a Bloom filter can say "definitely not eligible" but only "possibly eligible", matching the standard Bloom semantics. In the recsys-eligibility setting this is acceptable — false-positives mean a small fraction of ineligible items survive the filter and get caught downstream; false-negatives (which Bloom filters never produce) would silently drop eligible items, which is the unacceptable failure mode.
In-graph composition with fused Int8 ANN search¶
The headline structural win — beyond the per-primitive speedup — is that the filter result "is already inside the model, [so] it can flow directly into ANN search without a separate service call." This is the probe-then-filter co-design disclosed in the SilverTorch performance decomposition: "the probe-then-filter co-design cuts filter compute by another 30×."
In a microservice mesh, the analogous pattern would require either (a) the filter service to communicate with the ANN service over RPC for every probe, paying network overhead, or (b) the filter to overproduce a candidate set that ANN later prunes, paying memory overhead. In one PyTorch model graph, both costs disappear — pick the most promising clusters first, filter only inside those clusters, then score only the survivors.
Performance¶
Disclosed datum (Source: sources/2026-05-26-meta-silvertorch-index-as-model-a-new-retrieval-paradigm-for-recommendation-systems):
- Bloom index filter: 291–523× faster than the CPU inverted-index baseline.
- Probe-then-filter co-design: additional 30× filter-compute reduction beyond the per-primitive win.
The two compose: the GPU-friendly primitive gets you the order-of-magnitude headline; the co-design with ANN inside one model graph gets you the next 30×.
Relationship to existing wiki Bloom-filter material¶
- concepts/bloom-filter catalogues the standard Bloom-filter primitive — bit array, k hash functions, false-positive-asymmetric semantics — first canonicalised on the wiki via Vercel's global routing post.
- This page documents the GPU-native, in-model-graph variant. The semantics are the same; the architectural location is different — inside a tensor as part of the retrieval forward pass, rather than alongside the application as a pre-filter.
Caveats¶
- The post does not disclose the Bloom-filter parameters: how many hash functions, how many bits per item, target false-positive rate, attribute-encoding scheme, or how multi-attribute filters compose (one Bloom per attribute? one combined Bloom?). Likely detailed in the SIGIR 2026 paper (arXiv:2511.14881).
- The 291–523× range vs CPU inverted index is a wide band; the post does not enumerate which configurations sit where.
- "Stored directly inside the model" implies a fixed-size tensor — but signature update mechanics for newly-published items aren't detailed here. The streaming-weight-update section of the post implies it lands via the same in-place tensor mutation path.