Skip to content

CONCEPT Cited by 1 source

Proactive cache of batch predictions

Definition

Proactive cache of batch predictions is a serving strategy where a batch job precomputes and stores predictions for all relevant entities before any user requests them, instead of computing on demand. The serving tier then just reads the precomputed result; only edge cases (user-triggered what-ifs, stale data) trigger on-demand recomputation.

The cache is "proactive" because:

  • It is populated before any query arrives, not as a side-effect of user queries.
  • It is invalidated / refreshed on a schedule, not on access.

Why proactive over on-demand

  • Predictable latency. Serving reads are a KV lookup; latency is bounded by storage, not by model-inference compute.
  • Cost efficiency. Batch compute can be 10× cheaper per prediction than online (spot instances, bulk throughput, no warm-pool overhead).
  • Freshness SLO. A daily schedule is a worst case — users see at worst day-old data, which is often sufficient for B2B / inventory decisions.
  • Online path only handles deltas. On-demand recomputation is reserved for genuine edge cases (user what-ifs, out-of-schedule inventory changes), keeping online compute light.

Canonical instance (Zalando ZEOS)

systems/zeos-replenishment-recommender:

"Once these settings are established, we proactively cache both the settings and the resulting recommendations on a daily basis. This ensures that our offline batch process consistently delivers up-to-date, dynamic recommendations, taking into account the latest inputs, forecasts, and stock states."

Operational shape:

  • Daily SageMaker Batch Transform runs the optimiser across all merchants × all articles.
  • Output stored in S3; a "report generated" notification fires to downstream consumers.
  • Online path (partner-portal what-if) only fires when a partner changes an inventory setting — otherwise, the cached daily batch answer is authoritative.
  • On-demand serving endpoint. No precomputation; every user request invokes model inference.
  • Caching layer in front of an endpoint. Responses from an online endpoint are cached opportunistically; proactive precomputation is different — the cache is populated without any request having happened yet.

Seen in

Last updated · 501 distilled / 1,218 read