PATTERN Cited by 1 source
Server-beacon pairing for render measurement¶
Pattern¶
To measure the end-to-end rendering delay of a client (browser, crawler) that you don't control, inject a unique request identifier into both:
- the server access log when the initial HTML response is emitted, and
- a beacon JavaScript library injected into that HTML response, which POSTs the identifier to a beacon server after render completion.
Later, join the two data streams on the request identifier
to recover per-request (crawl_time, render_complete_time)
pairs. Compute render_delay = render_complete_time - crawl_time.
Aggregate across many paired events to recover the full rendering-delay distribution (p50, p75, p99), slice by URL shape / URL prefix / response size, and compute the rendering success rate as "fraction of access-log entries with a matched beacon within a reasonable window."
Why this architecture¶
The measurement problem: you know when your server answered a crawl request, but the crawler's rendering happens inside the crawler's own infrastructure, outside your control. You need some signal from inside the crawler's sandbox to correlate with the server's crawl-time record.
The solution: inject JS that runs inside the crawler's headless browser, and fire a POST back to a server you do control, carrying the correlation key. The injected JS borrows the crawler's browser's own outbound HTTP capability.
Correlation-key design¶
edge middleware injects into HTML response:
<script>
(function() {
var REQUEST_ID = "crawl-abc123def"; ← generated at edge
window.addEventListener('load', function() {
// fire once the page is fully rendered
fetch('https://beacon.example.com/wrm', {
method: 'POST',
body: JSON.stringify({
req_id: REQUEST_ID,
url: window.location.href,
t: Date.now() ← render-complete time
(or beacon-server side, safer)
}),
keepalive: true
});
});
})();
</script>
server access log line for the same request:
2024-04-17T10:32:55Z GET /docs/foo 200 req_id=crawl-abc123def
user_agent=Googlebot/2.1 ...
Request ID is the join key. One record per request in each log; pairing is 1:1.
Canonical instance¶
Vercel + MERJ, 2024-08-01, nextjs.org, April 2024:
- 100,000+ access-log entries for Googlebot requests.
- 37,000+ beacon-matched pairs recovered (pairing yields less than 1:1 because non-indexable URLs, errored responses, and post-render beacon failures don't complete the loop — all explainable).
- 100 % of indexable HTML pages (200/304, no
noindex) paired — the claimed rendering-success rate. - Distribution recovered: p25 ≤ 4 s, p50 = 10 s, p75 = 26 s, p90 ≈ 3 h, p95 ≈ 6 h, p99 ≈ 18 h. See concepts/rendering-delay-distribution.
(Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process.)
Where the timestamp is measured¶
Preferred: server-side at beacon-POST arrival. Avoids clock skew between the crawler's rendering sandbox and the measurer's server-side infrastructure. Delay is measured as a local time-difference on equipment the measurer controls.
"The timestamp of the rendering completion (this is calculated using the JavaScript Library request reception time on the server)." (Source: sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process.)
Pairing failure modes¶
Every measurement loses some pairs; the loss shape tells you about the crawler:
- Access-log entry with no beacon: crawl happened, render didn't complete / beacon didn't fire. Could be:
- Non-indexable status (3xx/4xx/5xx → no render).
- Render failed (fraction of pages the crawler couldn't render).
- Bot's JS sandbox blocks outbound HTTP (AI-training crawlers may).
- Beacon POST was lost / timed out / rate-limited.
- Beacon entry with no access-log match: should not happen under normal operation — the beacon JS only gets injected when edge middleware tagged the request. If it does happen, suspects access-log sampling or retention gaps.
The ratio of matched:unmatched pairs is itself a signal — at
nextjs.org it told Vercel the rendering-success rate was
effectively 100 % for indexable pages.
Trade-offs¶
- Post-render-event timing precision.
window.onloadfires after main-thread idle; precise-render-complete definition is fuzzy (is it after main-thread idle? after all async settled? after all network done?). Fine for a distribution; less fine for a millisecond-precision latency. - Long-tail renders complete long after the request-ID lookup window. At p99 ≈ 18 h, your access-log index needs to retain enough history for the pairing to succeed. Pipeline design cost.
- Beacon server must be high-availability. A beacon server outage looks like a rendering-success drop until you disentangle the failure mode.
- Request-ID must be unique but opaque. Don't encode PII or sensitive routing info; it's in the HTML response body and the client-side JS, both visible to the crawler.
Adjacent patterns¶
- CSP / content-security-policy compatibility. Injected beacon JS needs to be permitted by the site's CSP. Usually requires a CSP nonce or an allow-list entry. Edge middleware can set both in the same request.
- Keepalive / sendBeacon.
fetch(..., {keepalive: true})ornavigator.sendBeacon()— both ensure the POST completes even if the rendering session terminates immediately after.
Seen in¶
- sources/2024-08-01-vercel-how-google-handles-javascript-throughout-the-indexing-process — canonical wiki instance. The pairing methodology drives the 37,000-pair render-delay dataset and the rendering-success claim.
Related¶
- patterns/edge-middleware-bot-beacon-injection — the upstream injection pattern.
- systems/merj-web-rendering-monitor — the specific beacon library in the canonical instance.
- concepts/rendering-delay-distribution — the output distribution.
- systems/googlebot — the measured crawler.