Skip to content

PATTERN Cited by 2 sources

Observability SDK as company-specific OpenTelemetry wrapper

Problem

Adopting OpenTelemetry fleet-wide means every service team has to configure exporters, register auto-instrumentations, pick semantic conventions, and decide which critical metrics to expose. Multiplied across hundreds of services this is mundane-but-load-bearing platform work that either (a) gets skipped on half the fleet so observability is uneven, or (b) gets done inconsistently so dashboards and alerts can't be shared across services.

Solution

The platform team ships a thin internal package that wraps upstream OTel core packages and pre-configures everything a service owner would otherwise have to do by hand. Service owners import the wrapper, call one start method, and get a fleet-standard observability baseline without reading any OTel documentation.

Zalando's canonical instantiation (Source: sources/2024-07-28-zalando-opentelemetry-for-javascript-observability-at-zalando):

import { SDK } from "@zalando/observability-sdk-node";
new SDK().start();

What the wrapper encapsulates

From the Zalando Node.js SDK:

  • Auto-configuration from platform environment variables — Kubernetes env vars set by the platform for every deployed application are parsed by the SDK so no constructor argument is needed in normal cases.
  • Curated auto-instrumentation set — HTTP by default, Express.js behind a boolean flag; other upstream instrumentations are deliberately not wired so the wrapper controls fleet-wide blast radius of any new OTel instrumentation.
  • Default critical metrics — CPU, memory, GC, event-loop lag for Node.js; Core Web Vitals (FCP, LCP, INP, CLS) for the browser.
  • Endpoint + auth pre-configured — the wrapper knows the platform's telemetry backend URL ( Lightstep at Zalando) and the auth envelope, so service owners don't ship credentials.
  • Single proxy for upstream dependencies — applications only import the wrapper package; upstream OTel packages are transitive dependencies. This gives the platform team a single upgrade choke-point for OTel version bumps and for security fixes.

When to use this pattern

  • You have a fleet of ≥10s of services that should share observability conventions.
  • Your platform has stable environment-variable conventions (Kubernetes labels, service-name env, region, env-name, etc.) the wrapper can read.
  • You want to roll out a new OTel auto-instrumentation, a new default metric, or a new exporter config across the fleet without every service owner touching their code.
  • You want to compliance-gate certain features (e.g. sampling rates, attribute redaction for PII) centrally.

When not to use this pattern

  • One-service startup: just use OTel directly; the wrapper adds maintenance cost with no shared-convention payoff.
  • You have no platform-owning team: wrappers without a team accumulate tech debt fast as OTel evolves.
  • You need per-service flexibility more than fleet consistency — the wrapper's value proposition is reducing per-service choice.

Failure modes

  • Wrapper version skew across fleet — if service owners pin old versions, the fleet-wide upgrade choke-point value evaporates. Requires a forcing function (e.g. automated PRs to bump).
  • Wrapper becomes its own surface area — as the team adds Zalando-specific APIs (e.g. traceAs in the browser SDK), the wrapper drifts from being a thin shim to being a proprietary framework. Not inherently bad but changes the maintenance model.
  • Upstream OTel breaking changes surface late — because applications depend on the wrapper not directly on OTel, platform team carries the burden of absorbing semver-major changes upstream before flipping the wrapper's default.

Sibling patterns

Seen in

Last updated · 550 distilled / 1,221 read