PATTERN Cited by 1 source

Probabilistic feature store over KV¶

Problem¶

A production recommender system enriches each request with historical user-item features pulled from a key-value feature store (Redis-class, Dynamo-class). At scale the KV store is expensive on multiple axes:

Network latency: 2–10 ms per lookup + high tail variance, eating directly into the ranker's p99 budget.
Hosting + maintenance: distributed KV stores with strict SLOs are not cheap; backfill, failover drills, and capacity planning are full-time work.
Fan-out amplification: scoring 1,000 candidate products per request becomes 1,000 KV lookups, which is either prohibitively slow or forces intrusive optimisations (per-user batch fetch, prioritisation, size limits) that couple model design to infrastructure.
Composite-key explosion: interaction features like "how many times did user X buy Y and Z together?" require a KV entry per composite key, and every lookup is another round trip.

Solution¶

Replace the external KV feature store with a sketching feature store built on a Bloom filter that lives in the serving process's RAM:

Encode features as composite keys. Each historical datum — "user U viewed article A", "user U bought X with Y" — is a string like f"{U}^{A}" or f"{U}^{X}^{Y}". Insert into the bloom filter as a set-membership entry.
Maintain an auxiliary possible_features set. For each feature axis, track every value seen (e.g. all article IDs). This set lives in process memory alongside the filter.
Serve via enumerate-and-probe. When the ranker needs "which articles did user U view?", iterate possible_articles and probe the bloom filter for each composite key. Return every article the filter says "might contain."
Rebuild the filter on a low-frequency batch cadence. Drop the old filter, rebuild from the source event log (S3 of events, the offline feature store) on a daily or multi-hour schedule. Ship the new filter to every serving replica.
Size the filter for a tolerable false-positive rate. The error rate — how often the filter says "possibly present" when the feature is actually absent — is a directly tunable parameter on the Bloom filter. Higher error rate → smaller filter → more noise in model input → lower AUC. Benchmark the accuracy / size curve on a held-out evaluation set and pick the knee point.

Why this wins on the right workload¶

Zalando's click-prediction benchmark (Source: sources/2021-10-04-zalando-space-efficient-machine-learning-feature-stores-using-probabilistic-data-structures):

~30× memory reduction at lossless AUC: 470 MB sketch vs 15 GB KV footprint; AUC 0.7997 vs 0.80. State fits comfortably in serving-process RAM.
90% of the AUC-from-history uplift at 0.3% of the memory: a 40 MB sketch (AUC ≈ 0.79) is "just" 330× smaller than the 15 GB KV store and still captures most of the historical-feature win.
Indistinguishable throughput: all three variants hit 20–22 k predictions/sec/core. The possible_articles iteration is not a measurable overhead at realistic vocabulary sizes.
Zero network latency, zero network tail. The 2–10 ms KV hop is gone; the tail variance disappears; no more "KV store is the bottleneck on p99" postmortems.
Hosting cost collapses. A distributed KV fleet is replaced by an in-process data structure that ships with the serving binary. Backfill becomes "rebuild the sketch from the event log."
Many-lookups-per-request becomes free. Probing 1,000 composite keys against an in-memory bloom filter is microseconds total; the same over a KV store is seconds.

Prerequisites (when this pattern fits)¶

The pattern is not a general replacement for KV feature stores. It only fits when all of these hold:

Feature vocabulary is bounded and enumerable. You know the full set of possible article IDs, category IDs, etc. Free-text / bag-of-words features are disqualified.
Freshness of minutes to hours is acceptable. See concepts/feature-store-freshness. Recency-sensitive signals (last-click-30-seconds-ago) need the KV store.
Individual record deletion is not needed, or can be handled by batch rebuild. GDPR right-to-be-forgotten can still be served — just with a rebuild-latency SLO.
Queries are membership-only. "Did user U view article A?" Yes. "Give me all events from day X" — not supported.
Model tolerates input noise. Tree ensembles and regularised linear models absorb bit-flip noise gracefully; pathological models that memorise single features may not.

Anti-pattern — when this fails¶

Real-time engagement features (feed personalisation, news ranking). Freshness requirement is sub-second; sketch rebuild cadence doesn't fit.
Free-text-derived features (review bag-of-words, search-query n-grams). Feature vocabulary is unbounded; enumerate-and-probe cannot reconstruct the feature vector.
Count / frequency features. Bloom filter answers only membership; a Count-Min Sketch would be the right substrate there (unbenchmarked in the source).
Workloads already latency-bound on CPU, not network. The pattern wins on network-hop elimination; if your bottleneck is already the ranker's feature engineering or model inference, this doesn't help.

concepts/sketching-feature-store — the underlying substrate.
concepts/feature-store-freshness — the axis that scopes this pattern's applicability.
concepts/bloom-filter — the canonical sketch.
concepts/probabilistic-data-structure — the broader family.
concepts/false-positive-vs-false-negative-asymmetry — why the error mode is tolerable.
concepts/online-vs-offline-feature-store — the substrate this replaces.
patterns/bloom-filter-membership-test-before-storage-fetch — the pre-filter pattern; this pattern is its degenerate case where the filter is the store rather than fronting one.
companies/zalando

Seen in¶

sources/2021-10-04-zalando-space-efficient-machine-learning-feature-stores-using-probabilistic-data-structures — benchmark establishing the 30× memory reduction at lossless click-prediction AUC (Zalando, 2021-10-04).