Skip to content

PATTERN Cited by 1 source

Manual span passing over async context

Problem

OpenTelemetry's JavaScript SDKs offer two ways to propagate a span across async function boundaries:

  1. Implicit context propagation via tracer.startActiveSpan(name, async () => { ... }) on Node.js (backed by AsyncLocalStorage) or Zone.js in the browser (which monkey-patches global functions like setTimeout, Promise, and fetch).
  2. Explicit manual span passingtracer.startSpan(name, {}, context) returns a span object that callers pass through function parameters, add tags to, and call finish() on.

Option 1 is ergonomic but has two costs:

  • On the server the context-API ergonomics are very different from OpenTracing's span-passing style. Migrating a large existing OpenTracing codebase to the active-span paradigm is mechanical but pervasive churn — every function that takes a span would stop taking a span and start relying on ambient context.
  • In the browser there is no native async-context primitive (the TC39 AsyncContext proposal is still at Stage 2 as of 2024), so the implementation dependency is Zone.js which monkey-patches global functions in the customer's browser — a behaviour that touches every setTimeout, Promise, fetch call running in the page. This is unappealing for production code running on customer devices (see concepts/zone-js-monkey-patching).

Solution

Opt out of OTel's implicit context API on both server and browser, and use explicit manual span passing everywhere — spans are returned from a startSpan() call, passed through function parameters, tagged with addTags(), and closed with finish() at the call-site's natural completion point.

Migration compatibility on the server (from OpenTracing):

// 1. Starting a span in OpenTracing
const span = tracer.startSpan("name");
await callOtherFunction(span);

// 2. OpenTelemetry with active span (rejected)
await tracer.startActiveSpan("name", async () => {
  await callOtherFunction(span);
});

// 3. OpenTelemetry with manual context (chosen)
const context = getContextFromSomewhere();
const span = tracer.startSpan("name", {}, context);
await callOtherFunction(span);

Approach 3 lets existing OpenTracing call-sites keep their span-passing signatures unchanged during the OTel migration.

Tradeoffs

What you give up

  • Less ergonomic per call-site — every function that produces or consumes a span has to accept/return it as a parameter. No implicit "current span" lookup.
  • Easier to drop spans accidentally — forgetting to pass the span to a callee silently disconnects the trace; there's no ambient context to fall back to.
  • No built-in OTel integration helpers — tools that expect the active-span API (e.g. certain instrumentations that read the current context to create child spans automatically) won't work without adaptation.

What you get

  • Zero migration cost from an existing OpenTracing codebase.
  • No monkey-patching in production browser code — the SDK's behaviour is confined to the SDK, not spread across every global async primitive.
  • Explicit span lifetime — reading the code shows exactly which span a given operation is under; no hidden ambient state.

When to use this pattern

  • You have a large existing tracing codebase (OpenTracing or otherwise) with span-passing conventions and want to migrate to OTel without rewriting every call-site.
  • You run browser-side tracing and don't want Zone.js's global monkey-patching in customer browsers.
  • You value explicit-over-implicit control-flow visibility in code review.

When not to use this pattern

  • Greenfield service with no legacy tracing code and no browser-side monkey-patching concern — the active-span API is more ergonomic.
  • Team is unfamiliar with distributed tracing — the active- span API is easier to teach.
  • You rely heavily on third-party OTel auto-instrumentations that assume the active-span API as their context discovery mechanism.

Failure modes

  • Silent parenthood loss — pass a span to one function, forget to pass it to the third one down the stack, and spans below become orphans. Easy to miss in code review.
  • Span object lifetime bugs — forgetting finish() or calling it twice. Async-context APIs handle this automatically inside startActiveSpan's callback scope.
  • Error-path span passing — catch blocks that log an error without having access to the span can't tag the span with error: true. Explicit passing has to include error paths too.

Zalando's specific choice

Source: sources/2024-07-28-zalando-opentelemetry-for-javascript-observability-at-zalando:

  • On the server: migration compat from their large existing OpenTracing codebase — "We ended up not using context as it was easy to migrate from OpenTracing that way."
  • In the browser: rejected Zone.js explicitly — "We are not big fans of this, especially when done in the customer's browser, and hence opted out of context on the client side as well, resorting back to manual passing of span objects."

The systems/zalando-observability-sdk-browser's framework-exposed traceAs() API (patterns/framework-exposed-tracing-api-for-renderer-developers) is built on this manual-span-passing convention — the span returned from traceAs() is what the renderer developer passes around.

Sibling patterns

Seen in

Last updated · 550 distilled / 1,221 read