Skip to content

SYSTEM Cited by 5 sources

OpenTelemetry

OpenTelemetry (OTel; opentelemetry.io) is the open standard for instrumenting applications with distributed traces, metrics, and logs. It is the instrumentation-side complement to an observability backend like Honeycomb.

Why it shows up on the wiki

OTel is cited in the Fly.io corpus as the single most important observability investment Fly.io made, with reversals on prior skepticism from two different authors.

From Thomas Ptacek's 2025-03-27 post on tkdb:

"Most of that is down to OpenTelemetry and Honeycomb. From the moment a request hits our API server through the moment tkdb responds to it, oTel context propagation gives us a single narrative about what's happening. I was a skeptic about oTel. It's really, really expensive. And, not to put too fine a point on it, oTel really cruds up our code. Once, I was an '80% of the value of tracing, we can get from logs and metrics' person. But I was wrong." (Source: sources/2025-03-27-flyio-operationalizing-macaroons.)

From JP Phillips's 2025-02-12 exit interview:

"Without oTel, it'd be a disaster trying to troubleshoot the system. I'd have ragequit trying." (Source: sources/2025-02-12-flyio-the-exit-interview-jp-phillips.)

Load-bearing property: context propagation

The specific OTel feature Fly.io repeatedly names is context propagation — a trace ID and span context that travels with a request across process, service, and network boundaries, so that every span emitted by every service on the request path can be stitched into a single trace tree.

Fly.io's stack has at least these spans per request:

  • Primary API (entry point, user-facing).
  • tkdb client library (verification / sign / revoke).
  • tkdb server (Noise handshake, SQLite query, response).

Without propagation, each service would produce its own orphan logs — diagnosing a verification failure would require hand-correlation by timestamp. With propagation, the whole lineage is one trace in Honeycomb.

Trade-offs Fly.io names

  • "Really, really expensive" — both in ingestion cost and infrastructure.
  • "Cruds up our code" — instrumenting every call site is invasive.
  • Counterweight: "worth the money to pay someone else to manage tracing data" (JP).
  • Net judgment: "I was wrong" (Ptacek) — the 20% tracing adds over logs+metrics is load-bearing, not diminishing- returns.

Seen in

  • sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalogOTel as the protocol-portable boundary between agent instrumentation and lakehouse-resident storage. Databricks' 2026-05-22 launch ships OTLP/gRPC + REST direct ingest into Unity Catalog Delta tables via a managed serverless engine (Zerobus Ingest) — the post argues OTel's load-bearing property as "using the OTel standard to separate instrumentation from storage". Two protocol surfaces consumed: OTLP/gRPC for open-source collectors and framework SDKs (LangGraph, OpenAI, Anthropic) and REST for application-framework integration (MLflow). The architectural payoff named explicitly: "Q: Can I use this for agents running outside of Databricks? A: Yes, the agent can be running anywhere. In fact the support assistant agent example that was used for this blog is deployed locally." This makes OTel the client-side decoupling primitive that lets agents in any environment (customer VPC, developer laptop, third-party cloud) emit to the same governed lakehouse store. The post's "single-sink" framing — "Existing OLTP-compatible collectors can point directly to this endpoint via gRPC, entirely bypassing intermediate message buses like Kafka" — turns OTel into a drop-in re-pointable wire protocol. Six storage tables produced on the receiving side: see systems/uc-otel-trace-tables. Companion concept: concepts/instrumentation-storage-decoupling (this is what "OTel is the boundary" operationalises). Companion patterns: patterns/managed-otel-ingestion-direct-to-lakehouse (the canonical instance), patterns/telemetry-to-lakehouse (the broader pattern). First wiki disclosure of OTel as the inbound wire protocol for agent-side spans (vs the prior service-RPC and Kafka-record-header faces).

  • sources/2026-05-13-aws-streaming-cloudwatch-metrics-to-vpc-based-opentelemetry-collectors-using-lambdacanonical wiki disclosure of the OpenTelemetry collector's receiver / processor / exporter three-stage pipeline as the load-bearing internal abstraction that turns the collector into a vendor-neutral central hub. Verbatim: "The OpenTelemetry collector operates through three primary components that work together in a processing flow: Receivers accept data in specified formats (like Prometheus or OpenTelemetry Protocol (OTLP)) and translate it into OpenTelemetry's internal format; Processors manipulate and enrich the data as it flows through (filtering unnecessary data, batching for performance, transforming to mask sensitive information, or adding metadata like Kubernetes attributes); and Exporters send the processed data to destination backends such as Grafana Cloud, AWS X-Ray, Lightstep or Honeycomb." See concepts/opentelemetry-collector-three-stage-pipeline. First wiki naming of OTel collector in the VPC-self-hosted shape — running as a container on EC2 instances in a customer VPC, fronted by an internal NLB, ingesting CloudWatch Metric Streams (via Firehose + Lambda transform bridge) and fanning out to multiple observability backends. Cited exporter-destination examples: Grafana Cloud, AWS X-Ray, Lightstep, Honeycomb. AWS's open-source OTel distribution ( ADOT) explicitly named as the get-started on-ramp. The post's vendor-neutrality argument: "Future-proofing by avoiding vendor lock-in and enabling flexibility in choosing observability backends" + "The Apache 2.0 license is free and royalty-free." See patterns/cloudwatch-metric-stream-to-vpc-otel-collector.

  • sources/2025-03-27-flyio-operationalizing-macaroons — canonical wiki instance; Ptacek's "I was wrong" retraction.

  • sources/2025-02-12-flyio-the-exit-interview-jp-phillips — JP Phillips's "I'd have ragequit" — engineering-side corroboration.
  • sources/2025-06-24-redpanda-why-streaming-is-the-backbone-for-ai-native-data-platformsOpenTelemetry context propagation via Kafka record headers — concrete carrier disclosure for the streaming-boundary analogue of HTTP-header-based context propagation. Verbatim: "Using best practices, such as Open Telemetry tracing standard conventions and propagating the tracing using record headers, is particularly helpful as organizations adopt Open Telemetry for all their observability data." Extends the Fly.io application-RPC-boundary framing on this page to the streaming-broker boundary: every Kafka record carries its traceparent / tracestate in record headers so a trace can span producer → broker → consumer chains the same way it spans HTTP request → HTTP response chains.
Last updated · 542 distilled / 1,571 read