CONCEPT Cited by 1 source
Proactive cache of batch predictions¶
Definition¶
Proactive cache of batch predictions is a serving strategy where a batch job precomputes and stores predictions for all relevant entities before any user requests them, instead of computing on demand. The serving tier then just reads the precomputed result; only edge cases (user-triggered what-ifs, stale data) trigger on-demand recomputation.
The cache is "proactive" because:
- It is populated before any query arrives, not as a side-effect of user queries.
- It is invalidated / refreshed on a schedule, not on access.
Why proactive over on-demand¶
- Predictable latency. Serving reads are a KV lookup; latency is bounded by storage, not by model-inference compute.
- Cost efficiency. Batch compute can be 10× cheaper per prediction than online (spot instances, bulk throughput, no warm-pool overhead).
- Freshness SLO. A daily schedule is a worst case — users see at worst day-old data, which is often sufficient for B2B / inventory decisions.
- Online path only handles deltas. On-demand recomputation is reserved for genuine edge cases (user what-ifs, out-of-schedule inventory changes), keeping online compute light.
Canonical instance (Zalando ZEOS)¶
systems/zeos-replenishment-recommender:
"Once these settings are established, we proactively cache both the settings and the resulting recommendations on a daily basis. This ensures that our offline batch process consistently delivers up-to-date, dynamic recommendations, taking into account the latest inputs, forecasts, and stock states."
Operational shape:
- Daily SageMaker Batch Transform runs the optimiser across all merchants × all articles.
- Output stored in S3; a "report generated" notification fires to downstream consumers.
- Online path (partner-portal what-if) only fires when a partner changes an inventory setting — otherwise, the cached daily batch answer is authoritative.
Related to but distinct from¶
- On-demand serving endpoint. No precomputation; every user request invokes model inference.
- Caching layer in front of an endpoint. Responses from an online endpoint are cached opportunistically; proactive precomputation is different — the cache is populated without any request having happened yet.
Seen in¶
Related¶
- concepts/online-vs-offline-feature-store — sibling architectural decision about where freshly computed predictions live.
- patterns/proactive-cache-daily-batch-prediction
- systems/zeos-replenishment-recommender
- companies/zalando