SYSTEM Cited by 1 source
Zalando Observability SDK (Browser)¶
What it is¶
@zalando/observability-sdk-browser is Zalando's
client-side OpenTelemetry SDK — a
thin wrapper over upstream OTel browser packages, carefully
trimmed to minimal bundle size and integrated with Zalando's
web framework (Rendering
Engine) so frontend developers can instrument custom
client-side operations
(Source: sources/2024-07-28-zalando-opentelemetry-for-javascript-observability-at-zalando).
Why it exists¶
Pre-2023, the only client-side observability tool Zalando had was Sentry error tracking, which could tell the team that an error happened but not why. The concrete motivating example was web-checkout WAF false positives: some customer checkout requests were blocked by the application firewall at Skipper as suspected bots; with tracing spans only on the server, there was no way to connect a button click in the browser to a missing request at the proxy, and no way to count or diagnose affected customers.
Design constraints that differ from server-side¶
The post catalogues three peculiarities that force the browser SDK to look different from the Node.js sibling (systems/zalando-observability-sdk-node):
1. Bundle size¶
"On the web, every byte counts." The team trialled an
off-the-shelf telemetry package that added ~400 KB to the
page; this was unacceptable. They cherry-picked upstream
OTel packages — loading only the ones needed in the critical
page-load path, deferring the rest — and achieved
~30 KB added total, a >13× reduction. Non-critical
packages are loaded asynchronously after the critical path
completes. Network requests for telemetry use sendBeacon()
to run at the lowest network priority.
2. Telemetry transport via public internet + edge proxy¶
Server-side telemetry collectors are usually deployed in the same cluster as the source service, so no public-internet routing is needed. Browsers, by definition, speak over the public internet; the collector needs a publicly addressable endpoint. Zalando routes frontend telemetry through Skipper (the edge proxy) to an internal collector, with rate-limits configured as endpoint protection (patterns/edge-proxy-as-telemetry-collector-ingress). A custom template is shipped for other Zalando apps that want to deploy the same proxy topology.
3. GDPR consent¶
Data exported from a customer's browser must be explicitly consented to under GDPR. The SDK gates every export call on this consent — no consent, no data leaves the browser (concepts/gdpr-consent-gated-telemetry).
No async-context propagation¶
The browser has no native async-context primitive (the
TC39 AsyncContext proposal
is still in progress), so the OTel JS browser packages rely
on Zone.js,
which monkey-patches global functions (setTimeout,
Promise, etc.) in the customer's browser. Zalando
"are not big fans of this, especially when done in the
customer's browser" and opted out of context propagation
on the client, instead passing span objects manually
through function parameters — the same approach they took
on the server side for migration compatibility.
Instrumented signals¶
Tracing¶
- Page load — initial document request and subresource fetches.
- Entity resolution — Rendering Engine page composition.
- AJAX requests —
fetch()/XMLHttpRequest. - User-interaction custom spans — via the framework's
props.tools.observability.traceAs("op_name")API exposed inside every Rendering Engine renderer. Example from the post:
const span = traceAs("fetch_filtered_products");
span.addTags({ href: window.href });
serviceClient.get(`/search?q=${filter}`)
.then(res => { /* process */ })
.catch(err => { span.addTags({ error: true }); })
.finally(() => { span.finish(); });
Core Web Vitals (metrics)¶
- FCP (First Contentful Paint)
- LCP (Largest Contentful Paint)
- INP (Interaction to Next Paint)
- CLS (Cumulative Layout Shift)
Each metric uses custom histogram buckets because OTel
JS's defaults
[0, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000]
are designed for server-latency-in-ms-over-time, not single-
value-per-page-load web-vital metrics with ranges like
0-2000ms (LCP) or 0-1 (CLS). Buckets declared via OTel's
view + custom-aggregation API
(concepts/custom-histogram-buckets).
Alternative evaluated¶
The post explicitly calls out Grafana Faro as a "great package to check out if you are starting from scratch" — implying the reason Zalando didn't adopt it was timing / pre-existing decision, not a quality judgement.
Seen in¶
- sources/2024-07-28-zalando-opentelemetry-for-javascript-observability-at-zalando — canonical disclosure.
Related¶
- systems/opentelemetry — upstream.
- systems/zalando-observability-api — shared API package.
- systems/zalando-observability-sdk-node — server sibling.
- systems/zalando-rendering-engine — web framework integration.
- systems/skipper-proxy — telemetry-ingress proxy.
- systems/grafana-faro — alternative evaluated.
- concepts/client-side-observability.
- concepts/core-web-vitals.
- concepts/custom-histogram-buckets.
- concepts/send-beacon-telemetry-transport.
- concepts/gdpr-consent-gated-telemetry.
- patterns/bundle-size-budget-for-telemetry.
- companies/zalando.