SYSTEM Cited by 2 sources
OpenTelemetry¶
OpenTelemetry (OTel; opentelemetry.io) is the open standard for instrumenting applications with distributed traces, metrics, and logs. It is the instrumentation-side complement to an observability backend like Honeycomb.
Why it shows up on the wiki¶
OTel is cited in the Fly.io corpus as the single most important observability investment Fly.io made, with reversals on prior skepticism from two different authors.
From Thomas Ptacek's 2025-03-27 post on tkdb:
"Most of that is down to OpenTelemetry and Honeycomb. From the moment a request hits our API server through the moment
tkdbresponds to it, oTel context propagation gives us a single narrative about what's happening. I was a skeptic about oTel. It's really, really expensive. And, not to put too fine a point on it, oTel really cruds up our code. Once, I was an '80% of the value of tracing, we can get from logs and metrics' person. But I was wrong." (Source: sources/2025-03-27-flyio-operationalizing-macaroons.)
From JP Phillips's 2025-02-12 exit interview:
"Without oTel, it'd be a disaster trying to troubleshoot the system. I'd have ragequit trying." (Source: sources/2025-02-12-flyio-the-exit-interview-jp-phillips.)
Load-bearing property: context propagation¶
The specific OTel feature Fly.io repeatedly names is context propagation — a trace ID and span context that travels with a request across process, service, and network boundaries, so that every span emitted by every service on the request path can be stitched into a single trace tree.
Fly.io's stack has at least these spans per request:
- Primary API (entry point, user-facing).
tkdbclient library (verification / sign / revoke).tkdbserver (Noise handshake, SQLite query, response).
Without propagation, each service would produce its own orphan logs — diagnosing a verification failure would require hand-correlation by timestamp. With propagation, the whole lineage is one trace in Honeycomb.
Trade-offs Fly.io names¶
- "Really, really expensive" — both in ingestion cost and infrastructure.
- "Cruds up our code" — instrumenting every call site is invasive.
- Counterweight: "worth the money to pay someone else to manage tracing data" (JP).
- Net judgment: "I was wrong" (Ptacek) — the 20% tracing adds over logs+metrics is load-bearing, not diminishing- returns.
Seen in¶
- sources/2025-03-27-flyio-operationalizing-macaroons — canonical wiki instance; Ptacek's "I was wrong" retraction.
- sources/2025-02-12-flyio-the-exit-interview-jp-phillips — JP Phillips's "I'd have ragequit" — engineering-side corroboration.
Related¶
- systems/honeycomb — the Fly.io-chosen OTel backend.
- concepts/context-propagation-otel — the specific feature that's the wiki takeaway.
- companies/flyio.