Skip to content

CONCEPT Cited by 1 source

Google asset-caching internal heuristics

Definition

Google's Web Rendering Service does not obey HTTP Cache-Control headers when deciding whether to re-fetch a page's sub-resources (CSS, JavaScript, images, API responses, fonts). Instead, WRS applies its own internal freshness heuristics, separate from the page's declared caching policy.

Canonical verbatim: "Google speeds up webpage rendering by caching assets, which is useful for pages sharing resources and for repeated renderings of the same page. Instead of using HTTP Cache-Control headers, Google's Web Rendering Service employs its own internal heuristics to determine when cached assets are still fresh and when they need to be downloaded again."

(Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process.)

Why WRS decides caching itself

WRS renders pages at planet-scale for indexing. Two capacity pressures:

  1. Shared resources across pages. Many pages on a site share CSS / JS bundles, fonts, images. Honouring per-response Cache-Control would force WRS to re-evaluate freshness per asset per render — expensive at WRS scale.
  2. Repeated renders of the same page. WRS may re-render the same URL periodically as freshness signals change. Asset caching across those renders is a significant cost save.

Running the cache-freshness decision on internal heuristics lets WRS amortise asset downloads across renders without trusting any individual page's Cache-Control declarations (which may be incorrect, over-aggressive, under-aggressive, or set for very different cacheability targets than Google's needs).

What this means in practice

  • Cache-Control: no-store on a JS file does not guarantee a fresh fetch on every render. WRS may still use its cached copy.
  • Cache-Control: max-age=31536000 on a CSS bundle does not mean Google treats it as immutable for a year. WRS's heuristic may re-fetch sooner than the declared max-age.
  • Content-addressed asset URLs (fingerprinted filenames, app.abc123.js) still work correctly. The URL change is a strong signal that Google's heuristic is very likely to honour — a new URL is a new asset.
  • Cache-busting query parameters (?v=123) may not reliably bust the cache. Per the source post, the heuristic is opaque; behaviour is not guaranteed.

Interaction with sub-resource changes

A page that depends on a recently-changed JS file may render against an old cached version of that file from Google's perspective — producing stale rendered output relative to what users see. The canonical mitigation:

  • Fingerprint / content-hash asset URLs so a change produces a new URL (e.g. app.abc123.jsapp.def456.js). Google's heuristic is far less likely to have a stale copy of a fresh URL than to have a stale copy of an unchanged URL.
  • Prefer server-rendered HTML for SEO-critical content so stale-JS-execution is not a load-bearing path for the indexable DOM.

What WRS does not do

  • Honour Cache-Control headers for asset-cache freshness decisions at render time.
  • Expose its heuristic to site operators — the precise rules are not disclosed; only that they exist and are independent of the page's declared policy.
  • Differentiate based on Cache-Control directives like no-cache vs private vs must-revalidate. From the builder's standpoint, it's opaque.

Scope — not a general HTTP-caching critique

This is specifically about the rendering-stage asset cache inside WRS. It is not a statement about CDN caches, browser caches, service-worker caches, or any other HTTP-caching layer in the stack — those all obey Cache-Control conventionally. The anomaly is confined to Googlebot's render pipeline.

Seen in

Last updated · 476 distilled / 1,218 read