PATTERN Cited by 1 source

ML-predicted TTL¶

Pattern¶

Use a lightweight machine-learned model to predict the optimal time-to-live (TTL) for each cached item based on its observable features, rather than applying a fixed TTL or a purely heuristic rule. The model is trained on historical access traces to minimize total cost (memory rental + miss penalty).

Constraints¶

Latency: the prediction must complete within the request path — at billions of requests per second, only nanosecond-scale overhead is acceptable.
Interpretability: operators need to understand what the model is doing, especially when it affects cache hit rates.
Feature availability: the model can only use features available at cache-insertion time (item size, operation type, miss cost), not future access patterns.

Solution shape¶

A shallow decision tree (few splits, translatable to a handful of if-else branches in C++) that considers: - Page/item size - Miss cost (latency or I/O cost to re-fetch) - Operation type (read, scan, point lookup)

The tree predicts the optimal TTL that minimizes expected cost under the ski-rental cost model.

Trade-offs¶

Sacrifices marginal accuracy vs deeper models for zero-overhead inference
Requires periodic retraining if workload characteristics shift
On public traces without application-level features, a simpler per-page historical-best-TTL lookup is used as a fallback

Production evidence¶

Deployed in Spanner: the decision tree approach combined with ski-rental theory yielded −15.5% memory, +5.5% misses (cheap ones), −5% TCO. (Source: sources/2026-06-25-google-optimizing-cloud-economics-with-linear-elastic-caching)

Seen in¶

sources/2026-06-25-google-optimizing-cloud-economics-with-linear-elastic-caching — shallow decision tree predicting per-page TTL in Spanner at billions of RPS