Skip to content

CONCEPT Cited by 1 source

Crawl budget impact of JS complexity

Definition

Crawl budget is Google's shorthand for the aggregate per-site capacity allocation — how many pages Googlebot will fetch and render from a given site in a given time window. Crawl budget doesn't scale with site size; a 1,000,000-page site and a 1,000-page site don't get linearly-different budgets.

JS complexity impact is the observation that JavaScript-heavy pages cost materially more per render than static HTML pages (full Chromium session, all sub-resources fetched, JS execution, async-work settlement, rendered-DOM emission). Per Google's own docs: "for large sites (10,000+ unique and frequently changing pages), this can impact the site's crawl budget."

The product of the two: on 10,000+ page sites, heavy client-side JS reduces the fraction of the site's URLs that get crawled / rendered / indexed in a given time window. The per-page rendering success rate stays 100 %, but fewer pages fit in the budget.

(Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process.)

The structural equation

  crawl_budget_seconds_per_day / per_page_render_cost_seconds
    = pages_crawled_and_rendered_per_day
  • Crawl budget is set by Google per-site based on site health, popularity, update frequency, server latency, and other signals.
  • Per-page render cost scales with:
  • JS bundle size
  • Number of sub-resources
  • Async-work wait (API calls, dynamic imports, streaming)
  • DOM complexity after JS execution

Tuning either lever changes the throughput equation.

What the empirical study shows

Vercel + MERJ measured on nextjs.org (the Next.js site, a modestly-sized Next.js App Router application):

  • JS complexity does not correlate with rendering success rate. 100 % of pages rendered, across minimal-JS and heavily-dynamic CSR pages.
  • JS complexity does not correlate with rendering delay at nextjs.org scale. p50 / p75 / p90 look similar across per-page-JS-complexity buckets.
  • The crawl-budget impact doesn't show up on a site this size. nextjs.org isn't at the 10,000+-frequently-changing- pages threshold. The impact is a Google-disclosed rule, not a nextjs.org-observed symptom.

The study quotes Google's large-site-managing-crawl-budget guide to acknowledge that the budget impact is real at scale, even if not visible on the study's site.

Who this affects

  • E-commerce catalogues with 100,000+ product pages, each heavily client-rendered (personalised pricing, availability, recommendations).
  • Classified / listing sites with millions of frequently- changing pages.
  • News / publishing platforms with decade-deep archives plus heavy article-page JS (comments widgets, interactive charts, video players).
  • Location / geo sites where each city / zip / region is its own URL with dynamic content.

What to do about it

Per the source post and Google's own guidance:

  1. Use SSG / ISR / SSR for SEO-critical content. The initial HTML body carries the content; per-page rendering cost is lower; per-page render still happens but with less work.
  2. Code-split aggressively. Only load what a given route needs. Google still runs the JS, but smaller bundles = faster render = more pages in the budget.
  3. Avoid JS-only navigation for in-site links. Use real <a href> anchors; the link-discovery regex works on them directly, and the per-page-render cost drops.
  4. Keep sitemap fresh with <lastmod>. Signals which pages are worth spending budget on; short-circuits link-graph traversal.
  5. Minimise blocked resources in robots.txt. A blocked JS file Google can't fetch may force a render retry (more budget spent on one page) or produce a broken rendered DOM.
  6. Consider traffic-aware pre- rendering for the URL shape where build-time pre-render of all pages is impractical — pre-render the hot URLs, ISR the cold ones.

Why it's a crawl-budget phenomenon and not a rendering-delay phenomenon

A page's individual rendering delay is p50 = 10 s regardless of JS complexity (at nextjs.org scale). But a site's aggregate rendering-throughput depends on per-page cost, and Google allocates capacity per site — not per page. So a JS-heavy site doesn't see individual pages take longer, it sees fewer pages get crawled per day. The symptom:

  • Some pages never get rendered within the time window between crawl cycles.
  • Some pages show stale indexed content.
  • New pages take longer to appear in search.

This is a different failure mode than "rendering queue delay" (an individual-page latency), and shows up only on sites big enough that the budget divides across many pages.

Seen in

Last updated · 476 distilled / 1,218 read