PATTERN Cited by 1 source
Link in non-rendered JSON payload for discovery¶
Pattern¶
Because Googlebot's
link
discovery works via regex over the initial HTML response
body (URL-shaped strings), you can seed URL discovery by
embedding link targets in a non-rendered JSON payload inside
the initial HTML response — even if the rendered DOM has no
corresponding <a href> elements pointing to those URLs.
The JSON payload is:
- in the initial HTML body (server-rendered, before client- side JS runs),
- not rendered to the DOM (the JSON might feed a React Server Component, a Redux preloaded state, a data-blob the page never surfaces as visible links),
- contains well-formed, non-URL-encoded URL strings in a
form the regex will match (
https://example.com/fooor/fooas absolute-path-relative).
Googlebot's crawl-stage regex pass finds these strings, enqueues them for crawl, and the new URLs enter the discovery flow.
Empirical confirmation¶
Vercel + MERJ (2024-08-01) added a JSON object similar to a
React Server Component payload to /showcase on nextjs.org,
containing links to new, previously-undiscovered pages.
Googlebot discovered and crawled them.
"Google can discover links in non-rendered JavaScript payloads on the page, such as those in React Server Components or similar structures... In both initial and rendered HTML, Google processes content by identifying strings that look like URLs, using the current host and port as a base for relative URLs."
(Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process.)
Constraints on the payload¶
- URL strings must be in plain form.
https%3A%2F%2Fexample.com%2Ffoois not discovered — Google's regex does not URL-decode before matching. - URL joins are not evaluated.
{"base": "https://example.com", "path": "/foo"}→ Google findshttps://example.comas a URL but does not computehttps://example.com/foo. The payload must contain the full URL as a single string if you want the full URL discovered. - Relative paths work, absolutised against the current page's
host + port.
{"url": "/foo/bar"}is discovered ashttps://current-host/foo/bar. - JSON delimiters don't disrupt detection. A URL string
between
"quotes in a JSON blob is still found — the regex is tokenising the body, not parsing JSON.
When to use¶
- CSR / SPA shells where the server can emit the URL list
in a
<script type="application/json">blob for crawl-time discovery, even if the user-facing links only materialise after JS runs. - Large-catalog sites where a curated subset of URLs should be discovered directly from the product page rather than exclusively via sitemap or link graph.
- RSC / app-router pages where the Server Component payload already contains structured-data URL references — discovery happens "for free."
- Third-party integrations (analytics, embed, config bundles) whose JSON payloads reference URLs you want Google to find.
When not to use¶
- When you want Google not to discover the URLs. The regex is a firehose — any URL-shaped string in the body is fair game. If you have admin / draft / staging URLs in a JSON payload on a public page, they're discoverable.
- When the URLs are encoded / indirect. URL joins, base64- encoded URLs, URLs split across multiple JSON fields — all invisible to discovery.
- When a sitemap would do. An up-to-date
sitemap.xmlis higher-signal, supports<lastmod>, and doesn't require page-level cooperation. Non-rendered JSON payload is complementary, not a sitemap replacement.
Discovery vs value assessment implications¶
Discovery happens pre-render. Value assessment happens post- render. A URL discovered in a non-rendered JSON payload gets:
- Crawled quickly (discovery is fast — it's a regex pass).
- Assigned a link-value score only after the hosting page is rendered — because WRS's link graph over the rendered DOM is what feeds PageRank-style assessment.
For pure-discovery use cases (make sure the crawler knows this
URL exists), the pattern is effective. For architectural-weight
use cases (this URL is centrally important in our site
structure), real <a href> links in the rendered DOM are
stronger.
Trade-offs vs sitemap¶
- Sitemap: structured, large-capacity, supports
<lastmod>, declared canonical URL list. Best for broad discovery. - Non-rendered JSON payload: contextual — URLs appear where they semantically belong, alongside the page that references them. Useful when the URL-context relationship is important for prioritisation.
They compose well: sitemap for exhaustive URL enumeration, non-rendered JSON for contextual reinforcement on pages that would normally have CSR-hidden link relationships.
Seen in¶
- sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process
— canonical wiki instance. Vercel + MERJ's deliberate
experimental confirmation via a custom JSON payload on
/showcaseand the observation that encoded URLs are not discovered.
Related¶
- concepts/link-discovery-vs-link-value-assessment — the upstream concept this pattern exploits.
- systems/googlebot — the pipeline operator.
- concepts/sitemap — the canonical alternative for URL discovery.
- concepts/rendering-queue — the stage this pattern specifically skips (discovery is pre-queue, value assessment is post-queue).