CONCEPT Cited by 1 source

Sketching feature store¶

Definition¶

A sketching feature store is a machine-learning feature store whose backing storage is a probabilistic data structure — most commonly a Bloom filter — that lives in the serving process's memory rather than in an external key- value database. Features are encoded as composite keys (e.g. f"{user_id}^{article_id}"), inserted into the sketch as set-membership entries, and at query time the serving code enumerates the known feature value space and probes the sketch for each candidate, returning the set of features that the sketch answers "possibly present" for.

The substrate trades exact recall for sublinear space — a ~30× memory reduction at essentially indistinguishable model accuracy for a click-prediction workload, per Zalando's benchmark (Source: sources/2021-10-04-zalando-space-efficient-machine-learning-feature-stores-using-probabilistic-data-structures).

Mechanism¶

class FeatureStore:
    def __init__(self, store: BloomFilter):
        self.store = store
        self.possible_articles = set()

    def add(self, user_id: int, article_ids: Set[int]) -> None:
        for article_id in article_ids:
            self.possible_articles.add(article_id)
            composite_key = f'{user_id}^{article_id}'
            self.store.add(composite_key)

    def retrieve_articles(self, user_id: int) -> Set[int]:
        ret = set()
        for article_id in self.possible_articles:
            composite_key = f'{user_id}^{article_id}'
            if self.store.might_contain(composite_key):
                ret.add(article_id)
        return ret

Three load-bearing elements:

Composite key encoding. The sketch is a set of (entity, feature-value) pairs, not a map from entity to feature vector. Any higher-arity feature (e.g. "bought X together with Y") encodes as f"{user_id}^{X}^{Y}" and costs one sketch entry.
Auxiliary possible_features set. Reconstruction requires iterating all possible values of the feature axis (all article IDs, all categories, etc.). This set lives alongside the sketch in the same process; its size is independent of the number of users but proportional to the feature vocabulary.
Probe-then-filter retrieval. The user's feature vector is reconstructed at read time by enumerating possible_features and emitting each one for which the sketch answers "possibly present." False positives promote features to 1 that should be 0; false negatives never occur.

Three substrates of ML feature storage¶

Substrate	Latency	Freshness	Space	Query flexibility	When to use
External KV store (online feature store)	2–10 ms + tail	Seconds	Linear (all users × all features)	Full — any query, any update	Default for production feature stores; freshness matters
Offline store (S3/parquet)	Seconds–minutes	Hourly–daily	Linear, but cheap $/byte	Full — scans, filters, joins	Training, batch inference, debugging, audit
Sketching feature store (this page)	Nanoseconds–microseconds (in-process)	Low-frequency batch only	Sublinear (~3–0.3% of KV)	Membership only	Per-request many-lookups patterns where freshness isn't load-bearing

Benchmarked trade-off shape¶

Zalando's click-prediction benchmark (5.7 M examples, 2.8 M with historical data, 1.762 B feature values post-extraction) produced three variants fed into a logistic-regression click classifier:

No history (request features only): AUC ≈ 0.70.
Uncompressed history (conventional KV feature store): AUC ≈ 0.80, ~15 GB state.
Bloom-filter-backed sketch: AUC ≈ 0.7997 at 470 MB (3% of KV); AUC ≈ 0.79 at 40 MB (0.3% of KV, retaining 90% of the AUC uplift).

Throughput for all three variants: 20–22 k predictions/sec /core — the possible_articles iteration is not a measurable overhead at this vocabulary size. (Source: sources/2021-10-04-zalando-space-efficient-machine-learning-feature-stores-using-probabilistic-data-structures.)

Why the noise is harmless¶

A bloom-filter false positive flips a binary feature from 0 → 1 when it should have been 0. Unlike mislabelled training data, this appears to the model as noise in the input features — a well-studied regime that most ML models, including the logistic-regression classifier in the benchmark, absorb gracefully. The concepts/false-positive-vs-false-negative-asymmetry is load-bearing: bloom filters have no false negatives, so features that were present stay 1, and the classifier never loses a positive signal.

Where the substrate breaks¶

Unbounded feature vocabularies. If possible_features is not enumerable (e.g. bag-of-words on free text), reconstruction is not possible — you cannot probe all possible bigrams. Applicability is scoped to feature spaces with a known, bounded vocabulary.
High update frequency. Because the full state must live in every serving node's RAM, every write must be applied on every node — see concepts/feature-store-freshness.
Individual deletion. Sketches do not support deletion by construction; expiry requires full rebuild. "Remove events older than 30 days" forces low-frequency batch cadence.
Complex queries. "Give me all events from day X" requires metadata that defeats the space win. The sketch only answers set-membership.

Adjacent but distinct¶

patterns/bloom-filter-membership-test-before-storage-fetch — the pre-filter pattern, where the Bloom filter fronts an authoritative store to prune 404 lookups. The sketching feature store is the degenerate case of that pattern where the filter is the store.
concepts/online-vs-offline-feature-store — the mainstream two-substrate architecture this post proposes a third substrate for.

Seen in¶

sources/2021-10-04-zalando-space-efficient-machine-learning-feature-stores-using-probabilistic-data-structures — canonical benchmark establishing the 30× memory reduction at lossless AUC.

concepts/feature-store-freshness — the axis that disqualifies this substrate for high-frequency update workloads.
concepts/bloom-filter — the canonical sketch used.
concepts/probabilistic-data-structure — the broader family (HyperLogLog, Count-Min, Quotient Filters — all theoretically applicable, unbenchmarked in the source).
concepts/online-vs-offline-feature-store — the conventional two-substrate architecture.
concepts/false-positive-vs-false-negative-asymmetry — why the noise is tolerable.
patterns/probabilistic-feature-store-over-kv — the named substitution pattern.
companies/zalando