Skip to content

PATTERN Cited by 1 source

Link in non-rendered JSON payload for discovery

Pattern

Because Googlebot's link discovery works via regex over the initial HTML response body (URL-shaped strings), you can seed URL discovery by embedding link targets in a non-rendered JSON payload inside the initial HTML response — even if the rendered DOM has no corresponding <a href> elements pointing to those URLs.

The JSON payload is:

  • in the initial HTML body (server-rendered, before client- side JS runs),
  • not rendered to the DOM (the JSON might feed a React Server Component, a Redux preloaded state, a data-blob the page never surfaces as visible links),
  • contains well-formed, non-URL-encoded URL strings in a form the regex will match (https://example.com/foo or /foo as absolute-path-relative).

Googlebot's crawl-stage regex pass finds these strings, enqueues them for crawl, and the new URLs enter the discovery flow.

Empirical confirmation

Vercel + MERJ (2024-08-01) added a JSON object similar to a React Server Component payload to /showcase on nextjs.org, containing links to new, previously-undiscovered pages. Googlebot discovered and crawled them.

"Google can discover links in non-rendered JavaScript payloads on the page, such as those in React Server Components or similar structures... In both initial and rendered HTML, Google processes content by identifying strings that look like URLs, using the current host and port as a base for relative URLs."

(Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process.)

Constraints on the payload

  • URL strings must be in plain form. https%3A%2F%2Fexample.com%2Ffoo is not discovered — Google's regex does not URL-decode before matching.
  • URL joins are not evaluated. {"base": "https://example.com", "path": "/foo"} → Google finds https://example.com as a URL but does not compute https://example.com/foo. The payload must contain the full URL as a single string if you want the full URL discovered.
  • Relative paths work, absolutised against the current page's host + port. {"url": "/foo/bar"} is discovered as https://current-host/foo/bar.
  • JSON delimiters don't disrupt detection. A URL string between " quotes in a JSON blob is still found — the regex is tokenising the body, not parsing JSON.

When to use

  • CSR / SPA shells where the server can emit the URL list in a <script type="application/json"> blob for crawl-time discovery, even if the user-facing links only materialise after JS runs.
  • Large-catalog sites where a curated subset of URLs should be discovered directly from the product page rather than exclusively via sitemap or link graph.
  • RSC / app-router pages where the Server Component payload already contains structured-data URL references — discovery happens "for free."
  • Third-party integrations (analytics, embed, config bundles) whose JSON payloads reference URLs you want Google to find.

When not to use

  • When you want Google not to discover the URLs. The regex is a firehose — any URL-shaped string in the body is fair game. If you have admin / draft / staging URLs in a JSON payload on a public page, they're discoverable.
  • When the URLs are encoded / indirect. URL joins, base64- encoded URLs, URLs split across multiple JSON fields — all invisible to discovery.
  • When a sitemap would do. An up-to-date sitemap.xml is higher-signal, supports <lastmod>, and doesn't require page-level cooperation. Non-rendered JSON payload is complementary, not a sitemap replacement.

Discovery vs value assessment implications

Discovery happens pre-render. Value assessment happens post- render. A URL discovered in a non-rendered JSON payload gets:

  • Crawled quickly (discovery is fast — it's a regex pass).
  • Assigned a link-value score only after the hosting page is rendered — because WRS's link graph over the rendered DOM is what feeds PageRank-style assessment.

For pure-discovery use cases (make sure the crawler knows this URL exists), the pattern is effective. For architectural-weight use cases (this URL is centrally important in our site structure), real <a href> links in the rendered DOM are stronger.

Trade-offs vs sitemap

  • Sitemap: structured, large-capacity, supports <lastmod>, declared canonical URL list. Best for broad discovery.
  • Non-rendered JSON payload: contextual — URLs appear where they semantically belong, alongside the page that references them. Useful when the URL-context relationship is important for prioritisation.

They compose well: sitemap for exhaustive URL enumeration, non-rendered JSON for contextual reinforcement on pages that would normally have CSR-hidden link relationships.

Seen in

Last updated · 476 distilled / 1,218 read