CONCEPT Cited by 1 source

Proactive cache of batch predictions¶

Definition¶

Proactive cache of batch predictions is a serving strategy where a batch job precomputes and stores predictions for all relevant entities before any user requests them, instead of computing on demand. The serving tier then just reads the precomputed result; only edge cases (user-triggered what-ifs, stale data) trigger on-demand recomputation.

The cache is "proactive" because:

It is populated before any query arrives, not as a side-effect of user queries.
It is invalidated / refreshed on a schedule, not on access.

Why proactive over on-demand¶

Predictable latency. Serving reads are a KV lookup; latency is bounded by storage, not by model-inference compute.
Cost efficiency. Batch compute can be 10× cheaper per prediction than online (spot instances, bulk throughput, no warm-pool overhead).
Freshness SLO. A daily schedule is a worst case — users see at worst day-old data, which is often sufficient for B2B / inventory decisions.
Online path only handles deltas. On-demand recomputation is reserved for genuine edge cases (user what-ifs, out-of-schedule inventory changes), keeping online compute light.

Canonical instance (Zalando ZEOS)¶

systems/zeos-replenishment-recommender:

"Once these settings are established, we proactively cache both the settings and the resulting recommendations on a daily basis. This ensures that our offline batch process consistently delivers up-to-date, dynamic recommendations, taking into account the latest inputs, forecasts, and stock states."

Operational shape:

Daily SageMaker Batch Transform runs the optimiser across all merchants × all articles.
Output stored in S3; a "report generated" notification fires to downstream consumers.
Online path (partner-portal what-if) only fires when a partner changes an inventory setting — otherwise, the cached daily batch answer is authoritative.

On-demand serving endpoint. No precomputation; every user request invokes model inference.
Caching layer in front of an endpoint. Responses from an online endpoint are cached opportunistically; proactive precomputation is different — the cache is populated without any request having happened yet.

Seen in¶

sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive

concepts/online-vs-offline-feature-store — sibling architectural decision about where freshly computed predictions live.
patterns/proactive-cache-daily-batch-prediction
systems/zeos-replenishment-recommender
companies/zalando

Proactive cache of batch predictions¶

Definition¶

Why proactive over on-demand¶

Canonical instance (Zalando ZEOS)¶

Related to but distinct from¶

Seen in¶

Related¶