Skip to content

GOOGLE

Read original ↗

Optimizing cloud economics with linear elastic caching

Summary

Google Research describes linear elastic caching — a cost-aware caching strategy deployed in Spanner that replaces fixed-size caches with TTL-based admission/eviction guided by the ski-rental problem from online algorithms theory. Instead of evicting based on recency (LRU) or frequency, the system assigns per-page TTLs predicted by a lightweight decision tree that considers page size, miss cost, and operation type. Production deployment across Spanner's infrastructure reduced memory usage by 15.5% with only a 5.5% increase in cache misses and ~5% reduction in total cost of ownership (TCO), with the additional misses concentrated on cheap-to-fetch pages (0.5% I/O cost impact).

Key Takeaways

  1. Elastic caching reframes the problem: rather than "what to evict from a full cache," the question becomes "how long should this page stay?" — turning cache sizing into a per-page TTL assignment problem.

  2. Ski-rental problem as theoretical foundation: the decision "should I keep paying rent (memory cost per time unit) for this cached page, or let it go (pay the miss cost next time)?" maps directly to the classical ski-rental problem, which has a known optimal competitive ratio of 2 (the breakeven strategy).

  3. Cost-awareness concentrates misses on cheap pages: because the TTL prediction accounts for miss cost (the expense of re-fetching from storage), the small increase in cache misses is not uniformly distributed — it's concentrated on data that is cheap to fetch, yielding negligible I/O cost increase (0.5%) despite 5.5% more misses.

  4. Lightweight ML model for billion-RPS systems: Spanner handles billions of requests per second, so the TTL prediction model must be ultra-lightweight. A shallow decision tree (translatable to a few lines of C++) was chosen over more complex models — interpretability and nanosecond-scale overhead trump marginal accuracy gains.

  5. Production numbers at Spanner scale:

  6. Memory usage: −15.5%
  7. Cache misses: +5.5%
  8. TCO: −~5%
  9. I/O cost impact: +0.5% (negligible)

  10. GDSF as the fixed-cache baseline: the Greedy Dual Size Frequency (GDSF) algorithm — a generalization of LRU that accounts for variable page sizes — serves as the comparison baseline for fixed-cache-size approaches.

  11. Four elastic variants tested: breakeven ski-rental, randomized ski-rental, each with and without ML-learned per-page TTLs. The ML-augmented variants train on the first half of traces and predict best TTLs per page for the second half.

  12. Public trace validation: results are not Spanner-specific — experiments on publicly available CMU cache traces confirm the approach generalizes across workloads.

Systems & Concepts Extracted

Operational Numbers

Metric Change Notes
Memory usage −15.5% vs fixed-size cache
Cache misses +5.5% concentrated on low-cost pages
TCO −~5% memory savings > miss penalty
I/O cost +0.5% cost-awareness shields real I/O
Request rate billions/sec Spanner production scale
Model shallow decision tree few lines of C++, interpretable

Caveats

  • The article appears to be the "testing & results" section of a larger piece; theoretical derivations are referenced but not reproduced here.
  • Public-trace experiments couldn't use application-level features (no decision tree), so the ML-augmented results are only fully validated in the Spanner production context.
  • The 5% TCO reduction is "approximately" — exact number may vary by workload mix.

Source

Last updated · 559 distilled / 1,651 read