PATTERN Cited by 2 sources
Observability SDK as company-specific OpenTelemetry wrapper¶
Problem¶
Adopting OpenTelemetry fleet-wide means every service team has to configure exporters, register auto-instrumentations, pick semantic conventions, and decide which critical metrics to expose. Multiplied across hundreds of services this is mundane-but-load-bearing platform work that either (a) gets skipped on half the fleet so observability is uneven, or (b) gets done inconsistently so dashboards and alerts can't be shared across services.
Solution¶
The platform team ships a thin internal package that wraps upstream OTel core packages and pre-configures everything a service owner would otherwise have to do by hand. Service owners import the wrapper, call one start method, and get a fleet-standard observability baseline without reading any OTel documentation.
Zalando's canonical instantiation (Source: sources/2024-07-28-zalando-opentelemetry-for-javascript-observability-at-zalando):
What the wrapper encapsulates¶
From the Zalando Node.js SDK:
- Auto-configuration from platform environment variables — Kubernetes env vars set by the platform for every deployed application are parsed by the SDK so no constructor argument is needed in normal cases.
- Curated auto-instrumentation set — HTTP by default, Express.js behind a boolean flag; other upstream instrumentations are deliberately not wired so the wrapper controls fleet-wide blast radius of any new OTel instrumentation.
- Default critical metrics — CPU, memory, GC, event-loop lag for Node.js; Core Web Vitals (FCP, LCP, INP, CLS) for the browser.
- Endpoint + auth pre-configured — the wrapper knows the platform's telemetry backend URL ( Lightstep at Zalando) and the auth envelope, so service owners don't ship credentials.
- Single proxy for upstream dependencies — applications only import the wrapper package; upstream OTel packages are transitive dependencies. This gives the platform team a single upgrade choke-point for OTel version bumps and for security fixes.
When to use this pattern¶
- You have a fleet of ≥10s of services that should share observability conventions.
- Your platform has stable environment-variable conventions (Kubernetes labels, service-name env, region, env-name, etc.) the wrapper can read.
- You want to roll out a new OTel auto-instrumentation, a new default metric, or a new exporter config across the fleet without every service owner touching their code.
- You want to compliance-gate certain features (e.g. sampling rates, attribute redaction for PII) centrally.
When not to use this pattern¶
- One-service startup: just use OTel directly; the wrapper adds maintenance cost with no shared-convention payoff.
- You have no platform-owning team: wrappers without a team accumulate tech debt fast as OTel evolves.
- You need per-service flexibility more than fleet consistency — the wrapper's value proposition is reducing per-service choice.
Failure modes¶
- Wrapper version skew across fleet — if service owners pin old versions, the fleet-wide upgrade choke-point value evaporates. Requires a forcing function (e.g. automated PRs to bump).
- Wrapper becomes its own surface area — as the team
adds Zalando-specific APIs (e.g.
traceAsin the browser SDK), the wrapper drifts from being a thin shim to being a proprietary framework. Not inherently bad but changes the maintenance model. - Upstream OTel breaking changes surface late — because applications depend on the wrapper not directly on OTel, platform team carries the burden of absorbing semver-major changes upstream before flipping the wrapper's default.
Sibling patterns¶
- patterns/standardize-observability-sdk-per-language — the per-language scaling axis: ship a wrapper for Java, Node.js, browser, Python, Go so every runtime gets fleet- standard observability.
- patterns/bundle-size-budget-for-telemetry — the
browser-SDK-specific additional constraint that forces
cherry-
picked packages and
sendBeacon()export. - patterns/edge-proxy-as-telemetry-collector-ingress — the browser-SDK-specific ingress-topology pattern.
Seen in¶
- sources/2024-07-28-zalando-opentelemetry-for-javascript-observability-at-zalando — canonical architecture disclosure for Zalando's Node.js + browser + API SDK trio.
- sources/2024-07-24-zalando-nodejs-and-the-tale-of-worker-threads — worker-threads incident post names the same Node.js SDK, reporting 53 Node.js applications instrumented by 2024-07, as the adoption proxy.