Skip to content

PATTERN Cited by 1 source

Probabilistic feature store over KV

Problem

A production recommender system enriches each request with historical user-item features pulled from a key-value feature store (Redis-class, Dynamo-class). At scale the KV store is expensive on multiple axes:

  • Network latency: 2–10 ms per lookup + high tail variance, eating directly into the ranker's p99 budget.
  • Hosting + maintenance: distributed KV stores with strict SLOs are not cheap; backfill, failover drills, and capacity planning are full-time work.
  • Fan-out amplification: scoring 1,000 candidate products per request becomes 1,000 KV lookups, which is either prohibitively slow or forces intrusive optimisations (per-user batch fetch, prioritisation, size limits) that couple model design to infrastructure.
  • Composite-key explosion: interaction features like "how many times did user X buy Y and Z together?" require a KV entry per composite key, and every lookup is another round trip.

Solution

Replace the external KV feature store with a sketching feature store built on a Bloom filter that lives in the serving process's RAM:

  1. Encode features as composite keys. Each historical datum — "user U viewed article A", "user U bought X with Y" — is a string like f"{U}^{A}" or f"{U}^{X}^{Y}". Insert into the bloom filter as a set-membership entry.
  2. Maintain an auxiliary possible_features set. For each feature axis, track every value seen (e.g. all article IDs). This set lives in process memory alongside the filter.
  3. Serve via enumerate-and-probe. When the ranker needs "which articles did user U view?", iterate possible_articles and probe the bloom filter for each composite key. Return every article the filter says "might contain."
  4. Rebuild the filter on a low-frequency batch cadence. Drop the old filter, rebuild from the source event log (S3 of events, the offline feature store) on a daily or multi-hour schedule. Ship the new filter to every serving replica.
  5. Size the filter for a tolerable false-positive rate. The error rate — how often the filter says "possibly present" when the feature is actually absent — is a directly tunable parameter on the Bloom filter. Higher error rate → smaller filter → more noise in model input → lower AUC. Benchmark the accuracy / size curve on a held-out evaluation set and pick the knee point.

Why this wins on the right workload

Zalando's click-prediction benchmark (Source: sources/2021-10-04-zalando-space-efficient-machine-learning-feature-stores-using-probabilistic-data-structures):

  • ~30× memory reduction at lossless AUC: 470 MB sketch vs 15 GB KV footprint; AUC 0.7997 vs 0.80. State fits comfortably in serving-process RAM.
  • 90% of the AUC-from-history uplift at 0.3% of the memory: a 40 MB sketch (AUC ≈ 0.79) is "just" 330× smaller than the 15 GB KV store and still captures most of the historical-feature win.
  • Indistinguishable throughput: all three variants hit 20–22 k predictions/sec/core. The possible_articles iteration is not a measurable overhead at realistic vocabulary sizes.
  • Zero network latency, zero network tail. The 2–10 ms KV hop is gone; the tail variance disappears; no more "KV store is the bottleneck on p99" postmortems.
  • Hosting cost collapses. A distributed KV fleet is replaced by an in-process data structure that ships with the serving binary. Backfill becomes "rebuild the sketch from the event log."
  • Many-lookups-per-request becomes free. Probing 1,000 composite keys against an in-memory bloom filter is microseconds total; the same over a KV store is seconds.

Prerequisites (when this pattern fits)

The pattern is not a general replacement for KV feature stores. It only fits when all of these hold:

  • Feature vocabulary is bounded and enumerable. You know the full set of possible article IDs, category IDs, etc. Free-text / bag-of-words features are disqualified.
  • Freshness of minutes to hours is acceptable. See concepts/feature-store-freshness. Recency-sensitive signals (last-click-30-seconds-ago) need the KV store.
  • Individual record deletion is not needed, or can be handled by batch rebuild. GDPR right-to-be-forgotten can still be served — just with a rebuild-latency SLO.
  • Queries are membership-only. "Did user U view article A?" Yes. "Give me all events from day X" — not supported.
  • Model tolerates input noise. Tree ensembles and regularised linear models absorb bit-flip noise gracefully; pathological models that memorise single features may not.

Anti-pattern — when this fails

  • Real-time engagement features (feed personalisation, news ranking). Freshness requirement is sub-second; sketch rebuild cadence doesn't fit.
  • Free-text-derived features (review bag-of-words, search-query n-grams). Feature vocabulary is unbounded; enumerate-and-probe cannot reconstruct the feature vector.
  • Count / frequency features. Bloom filter answers only membership; a Count-Min Sketch would be the right substrate there (unbenchmarked in the source).
  • Workloads already latency-bound on CPU, not network. The pattern wins on network-hop elimination; if your bottleneck is already the ranker's feature engineering or model inference, this doesn't help.

Seen in

Last updated · 550 distilled / 1,221 read