Skip to content

CONCEPT Cited by 1 source

Feature store freshness

Definition

Feature store freshness is the end-to-end latency between a feature-altering event occurring (a click, a purchase, a view) and that event being visible to the feature store's readers (the recommender's serving path). It is a first- class design axis for any production ML system that uses historical features, because recent events carry disproportionate predictive signal — what the user clicked 30 seconds ago is more informative than what they clicked 30 days ago.

Freshness targets are typically seconds for interactive recommender systems; minutes for personalisation that tolerates a session delay; hours or longer only for analytics-style features where recency is not load-bearing.

Why freshness is the load-bearing property

Most feature-store design discussions focus on read-path latency (how fast can we fetch user features at ranking time) and space (how much storage for N users × M features). These are tractable — add a cache, add capacity. Freshness is harder: it is a write-throughput + consistency problem that interacts with every other architectural choice.

"Feature 'freshness', as in how quickly recent events can be reflected to the feature store is very important, as recent events tend to have high informational value." (Source: sources/2021-10-04-zalando-space-efficient-machine-learning-feature-stores-using-probabilistic-data-structures.)

Substrate fitness on the freshness axis

Substrate Update latency Write fan-out Deletion
External KV (Redis, DynamoDB) Sub-second — partitioned writes, per-key updates Partitioned: each node absorbs 1/N of traffic Per-record — direct DELETE
Online + offline feature store (concepts/online-vs-offline-feature-store) Seconds (online); minutes (offline) Online partitioned; offline append-only Per-record online; log-structured offline
Sketching / in-process (concepts/sketching-feature-store) Low-frequency batch only 100% of write traffic on every node Impossible — full rebuild required

The in-memory-state freshness cliff

A feature store that lives in serving-process memory (i.e. every node has the full state) cannot shard writes. The insertion mechanism may be incremental — Bloom filters, for instance, support add(key) — but each node must apply every write. For event streams at recommender-system scale (views, clicks in the tens of thousands per second), a single node cannot absorb that write rate alongside its read load.

The moment you shard the sketch so each node handles 1/N of writes, you have rebuilt a distributed database, giving up the in-process advantage entirely:

"Theoretically bloom filters could be distributed so that each node only needs to process a shard of the traffic — but at this point one would have converted one's real-time transaction server into a distributed database." (Source: sources/2021-10-04-zalando-space-efficient-machine-learning-feature-stores-using-probabilistic-data-structures.)

This is why the sketching feature store is only viable where a low-frequency batch rebuild of the state (daily, hourly if you're lucky) is acceptable — media recommenders, slowly-varying user profiles, not real-time engagement features.

The deletion / expiry corollary

Freshness has two directions: adding new events (covered above) and removing old events (expiry, GDPR right-to-be-forgotten, correction of incorrect data). Sketches cannot delete individual records — bits are shared across keys by construction — so expiry demands full rebuild of the state from the source dataset (e.g. S3 of events), which amortises only at batch cadence.

Age-partitioned Bloom filters are a research direction that allow coarse-grained expiry (drop the oldest partition) but "there are no mature implementations available" (Source: same).

Freshness as architectural fork

The "freshness matters" / "freshness doesn't matter" distinction partitions ML-feature-store design into two regimes:

  • Freshness matters: external partitioned KV store (conventional online feature store), or a streaming feature engine (Feast, Tecton, Chronon). Pay the network-hop cost (2–10 ms + tail) and the hosting cost; get per-key writes, per-record deletes, complex queries.
  • Freshness doesn't matter: in-process sketching feature store, or cached batch precomputation. Eliminate the network hop and the hosting cost; rebuild the state periodically; accept membership-only queries.

The Zalando benchmark establishes numerically that the "freshness doesn't matter" regime can be ~30× more memory- efficient at lossless AUC — the question for architects is which regime their workload belongs in, not which substrate is universally better.

Seen in

Last updated · 550 distilled / 1,221 read