PATTERN Cited by 1 source
Probabilistic feature store over KV¶
Problem¶
A production recommender system enriches each request with historical user-item features pulled from a key-value feature store (Redis-class, Dynamo-class). At scale the KV store is expensive on multiple axes:
- Network latency: 2–10 ms per lookup + high tail variance, eating directly into the ranker's p99 budget.
- Hosting + maintenance: distributed KV stores with strict SLOs are not cheap; backfill, failover drills, and capacity planning are full-time work.
- Fan-out amplification: scoring 1,000 candidate products per request becomes 1,000 KV lookups, which is either prohibitively slow or forces intrusive optimisations (per-user batch fetch, prioritisation, size limits) that couple model design to infrastructure.
- Composite-key explosion: interaction features like "how many times did user X buy Y and Z together?" require a KV entry per composite key, and every lookup is another round trip.
Solution¶
Replace the external KV feature store with a sketching feature store built on a Bloom filter that lives in the serving process's RAM:
- Encode features as composite keys. Each historical
datum — "user U viewed article A", "user U bought X
with Y" — is a string like
f"{U}^{A}"orf"{U}^{X}^{Y}". Insert into the bloom filter as a set-membership entry. - Maintain an auxiliary
possible_featuresset. For each feature axis, track every value seen (e.g. all article IDs). This set lives in process memory alongside the filter. - Serve via enumerate-and-probe. When the ranker needs
"which articles did user U view?", iterate
possible_articlesand probe the bloom filter for each composite key. Return every article the filter says "might contain." - Rebuild the filter on a low-frequency batch cadence. Drop the old filter, rebuild from the source event log (S3 of events, the offline feature store) on a daily or multi-hour schedule. Ship the new filter to every serving replica.
- Size the filter for a tolerable false-positive rate. The error rate — how often the filter says "possibly present" when the feature is actually absent — is a directly tunable parameter on the Bloom filter. Higher error rate → smaller filter → more noise in model input → lower AUC. Benchmark the accuracy / size curve on a held-out evaluation set and pick the knee point.
Why this wins on the right workload¶
Zalando's click-prediction benchmark (Source: sources/2021-10-04-zalando-space-efficient-machine-learning-feature-stores-using-probabilistic-data-structures):
- ~30× memory reduction at lossless AUC: 470 MB sketch vs 15 GB KV footprint; AUC 0.7997 vs 0.80. State fits comfortably in serving-process RAM.
- 90% of the AUC-from-history uplift at 0.3% of the memory: a 40 MB sketch (AUC ≈ 0.79) is "just" 330× smaller than the 15 GB KV store and still captures most of the historical-feature win.
- Indistinguishable throughput: all three variants hit
20–22 k predictions/sec/core. The
possible_articlesiteration is not a measurable overhead at realistic vocabulary sizes. - Zero network latency, zero network tail. The 2–10 ms KV hop is gone; the tail variance disappears; no more "KV store is the bottleneck on p99" postmortems.
- Hosting cost collapses. A distributed KV fleet is replaced by an in-process data structure that ships with the serving binary. Backfill becomes "rebuild the sketch from the event log."
- Many-lookups-per-request becomes free. Probing 1,000 composite keys against an in-memory bloom filter is microseconds total; the same over a KV store is seconds.
Prerequisites (when this pattern fits)¶
The pattern is not a general replacement for KV feature stores. It only fits when all of these hold:
- Feature vocabulary is bounded and enumerable. You know the full set of possible article IDs, category IDs, etc. Free-text / bag-of-words features are disqualified.
- Freshness of minutes to hours is acceptable. See concepts/feature-store-freshness. Recency-sensitive signals (last-click-30-seconds-ago) need the KV store.
- Individual record deletion is not needed, or can be handled by batch rebuild. GDPR right-to-be-forgotten can still be served — just with a rebuild-latency SLO.
- Queries are membership-only. "Did user U view article A?" Yes. "Give me all events from day X" — not supported.
- Model tolerates input noise. Tree ensembles and regularised linear models absorb bit-flip noise gracefully; pathological models that memorise single features may not.
Anti-pattern — when this fails¶
- Real-time engagement features (feed personalisation, news ranking). Freshness requirement is sub-second; sketch rebuild cadence doesn't fit.
- Free-text-derived features (review bag-of-words, search-query n-grams). Feature vocabulary is unbounded; enumerate-and-probe cannot reconstruct the feature vector.
- Count / frequency features. Bloom filter answers only membership; a Count-Min Sketch would be the right substrate there (unbenchmarked in the source).
- Workloads already latency-bound on CPU, not network. The pattern wins on network-hop elimination; if your bottleneck is already the ranker's feature engineering or model inference, this doesn't help.
Related¶
- concepts/sketching-feature-store — the underlying substrate.
- concepts/feature-store-freshness — the axis that scopes this pattern's applicability.
- concepts/bloom-filter — the canonical sketch.
- concepts/probabilistic-data-structure — the broader family.
- concepts/false-positive-vs-false-negative-asymmetry — why the error mode is tolerable.
- concepts/online-vs-offline-feature-store — the substrate this replaces.
- patterns/bloom-filter-membership-test-before-storage-fetch — the pre-filter pattern; this pattern is its degenerate case where the filter is the store rather than fronting one.
- companies/zalando
Seen in¶
- sources/2021-10-04-zalando-space-efficient-machine-learning-feature-stores-using-probabilistic-data-structures — benchmark establishing the 30× memory reduction at lossless click-prediction AUC (Zalando, 2021-10-04).