VERCEL 2024-08-01 Tier 3

Vercel — How Google Handles JavaScript Throughout the Indexing Process¶

Summary¶

Vercel + MERJ's 2024-08-01 joint empirical study of Googlebot's rendering behaviour on nextjs.org (with supplemental data from monogram.io and basement.io) over April 2024. Over 100,000 Googlebot fetches analysed; over 37,000 rendered HTML pages matched with server-beacon pairs. Methodology: custom Vercel Edge Middleware identifies search-engine-bot requests in the request path, injects a lightweight beacon JavaScript library into the HTML response, and the beacon fires on render completion, POSTing a timestamp + request identifier + URL to a long-running beacon server. Pairing the server access log's initial request with the beacon's render completion yields a per-page rendering-delay measurement.

The post debunks four long-standing SEO myths — Google can't render client-side JS; Google treats JS pages differently; the rendering queue delays indexing; JS-heavy sites have slower page discovery — with concrete distributional data and declared findings. Canonical disclosures: the rendering-delay distribution (p50 = 10 s, p75 = 26 s, p90 ≈ 3 h, p95 ≈ 6 h, p99 ≈ 18 h); 100 % of indexable HTML pages fully rendered; React Server Component streaming does not impair rendering; noindex in the initial HTML response is enforced before render so client-side removal is SEO-ineffective; Google discovers links inside non-rendered JSON payloads by regex over the HTML body, not just <a href>s in the rendered DOM; query-string URLs render slower than their path-only counterparts (p75 = 31 min vs 22 s).

The post closes with a rendering-strategy comparison table (SSG / ISR / SSR / CSR against five Google-facing capabilities: crawl efficiency, discovery, rendering completeness, rendering time, link-structure evaluation, indexing) and eight actionable recommendations anchored on Google's crawl-budget docs, React error boundaries, server-rendered SEO tags, critical- resource unblocking in robots.txt, HTML anchors over JS navigation, sitemaps with <lastmod>, Google Search Console URL Inspection, and Core Web Vitals as a ranking input.

Key takeaways¶

Google's rendering is universal, not selective. "Google now attempts to render all HTML pages, not just a subset." Of the 100,000+ Googlebot fetches on nextjs.org (excluding status-code errors and non-indexable pages), 100 % of HTML pages resulted in full-page renders, including pages with complex JS interactions. Every content-type-appropriate page passes through the Web Rendering Service; there is no pre-render triage that keeps static HTML on the fast path. Canonical concepts/universal-rendering instance. (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process)
Rendering-delay distribution (p50 = 10 s; p99 ≈ 18 h). Across 37,000+ server-beacon pairs, the distribution of time between Googlebot's initial crawl and its rendering completion was:

Percentile	Delay
p25	≤ 4 s
p50	10 s
p75	26 s
p90	~3 h
p95	~6 h
p99	~18 h

This is the canonical public empirical distribution for Google's Web Rendering Service queue delay (2024). The long tail is real (p90 in hours, not minutes) but median is tens of seconds, not days as the SEO folklore had it. Canonical concepts/rendering-delay-distribution instance. (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process)

Query-string URLs render slower than path-only URLs. Same 37,000-pair dataset, sliced by URL shape:

URL type	p50	p75	p90
All URLs	10 s	26 s	~3 h
Without query string	10 s	22 s	~2.5 h
With query string	13 s	31 min	~8.5 h

The median gap is small but the tail is dramatic: p75 rendering time for ?ref=...-style URLs is 31 minutes vs 22 seconds for path-only URLs. Google appears to de-prioritise rendering for parameterised URLs that likely return the same content as the canonical path-only URL. Canonical Google's rendering-prioritisation heuristics signal. (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process)

Client-side removal of noindex is SEO-ineffective. "Pages with noindex meta tags in the initial HTML response were not rendered, regardless of JS content. Client-side removal of noindex tags is not effective for SEO purposes." Googlebot applies noindex parse-time on the initial HTML body; if the tag is present at byte-arrival, the page is not enqueued for rendering, and any JS that would have removed it never runs. Canonical concepts/client-side-removal-of-noindex-ineffective instance — sharpens the existing concepts/noindex-meta-tag coverage with an explicit timing claim from Google-side data. (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process)
Googlebot discovers links inside non-rendered JSON payloads. Vercel added a JSON object similar to a React Server Component payload to /showcase, containing links to previously undiscovered pages — and Google discovered and crawled them. "In both initial and rendered HTML, Google processes content by identifying strings that look like URLs, using the current host and port as a base for relative URLs." Link discovery is a regex over the full response body (rendered DOM + raw JS / JSON payloads), not just <a href> elements. Caveat: Google did not discover an encoded URL (https%3A%2F%2F...) in the payload — the regex is strict about URL shape, no URL-decoding pre-pass. Canonical concepts/link-discovery-vs-link-value-assessment + new patterns/link-in-non-rendered-json-payload-discovery instance. (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process)
Link discovery and link value assessment are separate stages. "Google differentiates between link discovery and link value assessment. The evaluation of a link's value for site architecture and crawl prioritization occurs after full- page rendering." Discovery happens in the initial-HTML regex pass; value assessment (PageRank-style weighting, crawl prioritisation, navigational importance) happens after rendering. Implication: CSR pages are not disadvantaged in discovery (links on the page get found either way) but are disadvantaged in value assessment (value is set after render, so rendering delay propagates to crawl-prioritisation delay). Canonical concepts/link-discovery-vs-link-value-assessment instance. (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process)
Streamed React Server Component content is fully rendered. "Streamed content via RSCs was also fully rendered, confirming that streaming does not adversely impact SEO." Incrementally flushed HTML chunks from <Suspense> boundaries on the Next.js App Router on nextjs.org were fully captured by Googlebot. Extends concepts/streaming-ssr with the Google-successfully-indexes-streaming evidence — removes a specific adoption friction for Streaming SSR. (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process)
Stateless rendering — no cookies, no state, no click. "Each page render occurs in a fresh browser session, without retaining cookies or state from previous renders. Google will generally not click on items on the page, such as tabbed content or cookie banners." The Web Rendering Service renders each page as if it were a first-time unauthenticated visitor with no JavaScript-driven interaction. Hidden content behind click / tab / accordion / cookie-banner gates is invisible to indexing unless it is present in the initial DOM after rendering. Canonical concepts/stateless-rendering instance. (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process)
304 Not Modified is rendered using the 200's content; other 3xx/4xx/5xx are not rendered. The rendering pipeline maps status codes explicitly: 200 → full render; 304 → render the cached 200 body; 3xx/4xx/5xx → no render. Canonical disclosure that partial errors skip the render queue entirely (the status-code triage is pre-render, not post-render). (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process)
Sitemap with <lastmod> collapses rendering-strategy discovery differences. "Having an updated sitemap.xml significantly reduces, if not eliminates, the time-to-discovery differences between different rendering patterns." An up-to-date sitemap feeds Google a URL list directly; it bypasses link-graph traversal, and a <lastmod> bump is a strong re-crawl signal. Rendering strategy only matters for discovery if you're relying on Google finding new URLs via link graph. Extends concepts/sitemap with the rendering-strategy-neutrality property. (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process)
CSR imposes a crawl-budget tax at scale. "While Google can effectively render JS-heavy pages, the process is more resource-intensive compared to static HTML, both for you and Google. For large sites (10,000+ unique and frequently changing pages), this can impact the site's crawl budget." JS complexity doesn't change rendering success rate (100 % at nextjs.org scale) but does change rendering cost per page, and Google's per-site crawl budget divides across all discovered URLs. Canonical concepts/crawl-budget-impact-of-js-complexity instance. (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process)
Rendering-strategy crawl-efficiency comparison table. The post closes with an explicit 5-column × 6-row table mapping rendering strategies (SSG / ISR / SSR / CSR) to Google-facing capabilities (crawl efficiency, discovery, rendering completeness, rendering time, link-structure evaluation, indexing). CSR is consistently worst; SSG / ISR consistently best; SSR is strong on everything but rendering completeness can fail on edge cases (blocked resources, upstream errors). Canonical concepts/rendering-strategy-crawl-efficiency-tradeoff table. (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process)

Methodology — the Vercel+MERJ Web Rendering Monitor beacon loop¶

The measurement architecture is itself a canonicalised pattern:

Vercel Edge Middleware runs on every request to nextjs.org. It inspects the User-Agent + source IP / ASN + verified-bot signature to identify Googlebot (and other search / AI crawlers).
For bot requests, the middleware injects the MERJ Web Rendering Monitor lightweight JS library into the HTML response alongside a unique request identifier logged in server access logs.
When the bot finishes rendering, the WRM library fires on-render-complete and POSTs to a long-running beacon server with: page URL, request identifier, timestamp of rendering completion.
Beacon logs + server access logs are joined on request identifier → per-request (crawl-time, render-complete-time) pair. Delta is the rendering delay.
Aggregation produces the distribution, URL-type slices, and the 100 %-rendering-rate claim.

Two canonicalised patterns from this:

patterns/edge-middleware-bot-beacon-injection — bot- only JS-instrumentation via an edge-layer request interceptor. Generalises beyond this study: any site on a CDN with programmable edge logic can measure its own search-engine-facing rendering behaviour with the same three-part (detect bot → inject beacon → match on request ID) shape.
patterns/server-beacon-pairing-for-render-measurement — request-ID pairing of server log + client-side post-render beacon to recover a rendering-delay distribution even though the rendering client (the bot) is outside your control.

Operational numbers (extracted verbatim or computed)¶

100,000+ Googlebot fetches analysed (nextjs.org, April 2024).
37,000+ rendered-HTML pages with server-beacon pairs.
100 % of indexable HTML pages rendered.
Rendering delay: p25 ≤ 4 s, p50 = 10 s, p75 = 26 s, p90 ≈ 3 h, p95 ≈ 6 h, p99 ≈ 18 h.
Query-string URL rendering: p75 ≈ 31 min vs 22 s path-only.
1 month observation window (2024-04-01 to 2024-04-30).
3 sites measured (nextjs.org primary; monogram.io, basement.io supplemental).
10,000+ pages is Google's own stated crawl-budget-impact threshold (cited from the Google docs, not Vercel-measured).
Every 100 ms of load time saved ≈ 8 % uptick in website conversion (Deloitte study cited; performance framing).

Caveats¶

Single-site methodology, Vercel-hosted substrate. 100 % rendering rate, p50 = 10 s delay, and the query-string-URL slowdown were all measured on nextjs.org on Vercel infrastructure. A static-HTML site on a random shared host with blocked resources could easily see a different distribution. Vercel's optimised edge + image CDN + good Cache-Control headers might land nextjs.org on the fast end of Google's rendering queue. Generalisability is asserted, not measured.
Single-crawler focus. "We are still gathering data about other search engines, including AI providers like OpenAI and Anthropic." OpenAI / Anthropic / Perplexity / other AI- training crawlers are explicitly out of scope. Claims about Googlebot do not transfer to GPTBot / ChatGPT-User / Perplexity / others; AI-crawler rendering behaviour on CSR / streamed content is an open question.
Vendor stake. Vercel has a commercial interest in framing Next.js / the Next.js App Router / RSC streaming / ISR as SEO-safe (all four are Vercel's products). MERJ has a commercial interest in positioning itself as the SEO-and-data- engineering consultancy that runs this kind of study. The empirical data (100 %, p50 = 10 s) is the load-bearing contribution the vendor framing rides on — the methodology is reproducible, but the framing is self-interested. Caveat.
p90 / p95 / p99 tails are large and unexplained. p90 ≈ 3 hours, p99 ≈ 18 hours. The post asserts "these were the exception and not the rule" but doesn't dig into what makes a URL land in the tail (is it the CSR pages? query-string URLs? specific sections? specific response payload shapes?). /showcase vs /docs shows shorter median times for /docs — evidence that update frequency feeds into prioritisation — but the precise heuristic is not disclosed.
noindex tag enforcement timing not fully specified. The post says Google enforces noindex on the initial HTML body before rendering, but doesn't describe exactly when the check happens (stream start? end? after a specific byte offset?). If Google stream-scans the HTML for <meta name="robots" content="noindex"> before the full body arrives, a late-body noindex tag might or might not be enforced — edge case unaddressed.
No numbers on the rendering-strategy comparison table. The SSG/ISR/SSR/CSR comparison is asserted as Excellent / Very Good / Poor / Robust qualitative labels; no measured per-strategy p50/p99 renderings from the same 37,000-pair dataset. Understandable (nextjs.org is a mix of strategies; per-page strategy classification requires a different dataset pivot) but leaves the comparison at assertion altitude.
Status-code coverage is taxonomic, not distributional. "200 rendered; 304 rendered from 200's content; 3xx/4xx/5xx not rendered" is a mapping claim without disclosed counts. What fraction of total Googlebot fetches on nextjs.org were 3xx/4xx/5xx? Unknown.
PerplexityBot / ChatGPT-User data explicitly collected but not disclosed. "We are still gathering data about other search engines, including AI providers like OpenAI and Anthropic, and hope to talk more about our findings in the future." Follow-up post(s) promised but not published (2024-08-01 cutoff at time of this wiki ingest).
Non-encoded URL strictness is mentioned but not quantified. Google did not discover https%3A%2F%2Fwebsite.com in the RSC-like payload. How strict is the regex? Does it miss URL-join-with-path patterns (this.baseURL + this.segment)? Doesn't say; just the encoded-URL negative.
No cross-crawler comparison. Post measures Googlebot exclusively — no Bingbot / DuckDuckBot / YandexBot data even though they're caught by the same edge middleware.

Cross-source continuity¶

First Vercel ingest on the wiki — opens the Vercel corpus.
Extends the existing systems/vercel-edge-functions page (prior context: the PlanetScale serverless driver launch target for Edge runtime) with the bot-beacon-injection use case at a very different altitude.
Extends concepts/noindex-meta-tag with the Google-side timing claim: noindex is enforced pre-render on the initial HTML body, so client-side removal is ineffective. The 2026-04-17 Cloudflare noindex canonicalisation was about the AI-training-crawler insufficiency; this post is about Google-side enforcement timing. Complementary; no contradiction.
Extends concepts/streaming-ssr with the Google-successfully-indexes-streaming RSC evidence. The Confluence canonicalisation was about FCP / TTI wins; this post is about crawler-compatibility.
Extends concepts/incremental-static-regeneration with the Google-renders-ISR evidence (implicit — ISR pages on nextjs.org contribute to the 100 % rendering rate).
Extends systems/nextjs with the empirical Google- rendering evidence at nextjs.org scale. Prior Next.js coverage was about Cloudflare-vs-Vercel CPU profiling (2025-10-14) and Cloudflare's vinext clean reimplementation (2026-02-24); this post is the third altitude: Google-facing SEO behaviour at scale.
Sibling to Googlebot-as-declared- crawler coverage from the 2025-08-04 Cloudflare stealth- Perplexity post — that post canonicalised Googlebot as the transparency norm; this post is the measurement of Googlebot's actual rendering behaviour at scale.
Sibling to concepts/traffic-aware-prerendering — Cloudflare's vinext TPR addresses the build-time explosion of pre-rendered pages by only pre-rendering URLs with real traffic. Vercel's post is complementary: Google's indexing pipeline is the downstream consumer whose rendering-queue behaviour determines whether the TPR-skipped URLs (rendered on-demand via ISR) take an SEO hit. Short answer from this post: no — 100 % of indexable HTML pages render, and ISR pages are in the dataset.
No existing-claim contradictions — strictly additive.

Scope disposition¶

Tier-3 on-scope decisively. Vercel is Tier-3 (stricter content filter — PR / product-launch noise higher than Tier-1/2 blogs). This post is a 100,000-fetch empirical study of Google's indexing pipeline with concrete distributional disclosures (the rendering-delay percentiles alone are the first public-wiki canonical instance of that distribution), a reproducible measurement methodology (open-source beacon library linked), and a rendering-strategy comparison that bears on every SSR / CSR / ISR / SSG design decision in the corpus. Architecture density ~85 % of the body (methodology + numbers + rendering-pipeline evolution + myth-debunking + strategy comparison + recommendations). Far above the "<20 % architecture content" skip threshold. Vendor stake acknowledged but measurement is the load-bearing contribution — the framing rides on the data, not the other way round.

Source¶

Original: https://vercel.com/blog/how-google-handles-javascript-throughout-the-indexing-process
Raw markdown: raw/vercel/2024-08-01-how-google-handles-javascript-throughout-the-indexing-proces-3aa422e8.md
HN discussion: https://news.ycombinator.com/item?id=41127048 (200 points)

companies/vercel — first Vercel ingest; opens the corpus.
systems/googlebot — the measured crawler.
systems/google-web-rendering-service — the rendering pipeline inside Googlebot.
systems/merj-web-rendering-monitor — the open-source JS beacon library used.
systems/vercel-edge-functions — the bot-detection + beacon- injection substrate.
systems/nextjs — the framework whose site was measured.
systems/react — specifically React 18 + RSC streaming was measured.
concepts/universal-rendering — Google renders everything, not a subset.
concepts/stateless-rendering — no cookies, no clicks.
concepts/rendering-queue — the WRS queue that mediates crawl → index.
concepts/rendering-delay-distribution — the canonical percentile table.
concepts/cloaking — serving different content to bots and users (prohibited by Google).
concepts/client-side-removal-of-noindex-ineffective — the pre-render enforcement timing.
concepts/link-discovery-vs-link-value-assessment — two- stage crawl / render / value-assess pipeline.
concepts/google-asset-caching-internal-heuristics — WRS's caching decisions don't use HTTP Cache-Control.
concepts/crawl-budget-impact-of-js-complexity — CSR tax on large sites.
concepts/rendering-strategy-crawl-efficiency-tradeoff — the SSG/ISR/SSR/CSR capability table.
concepts/noindex-meta-tag — extended with Google-side timing semantics.
concepts/sitemap — extended with rendering-strategy- neutrality for discovery.
concepts/streaming-ssr — extended with Google-successfully- indexes-streaming-RSC evidence.
concepts/incremental-static-regeneration — extended with Google-renders-ISR evidence.
patterns/edge-middleware-bot-beacon-injection — the injection pattern.
patterns/server-beacon-pairing-for-render-measurement — the request-ID-pairing pattern.
patterns/link-in-non-rendered-json-payload-discovery — Google's URL-shaped-string regex over the body.