SYSTEM Cited by 5 sources
OpenTelemetry¶
OpenTelemetry (OTel; opentelemetry.io) is the open standard for instrumenting applications with distributed traces, metrics, and logs. It is the instrumentation-side complement to an observability backend like Honeycomb.
Why it shows up on the wiki¶
OTel is cited in the Fly.io corpus as the single most important observability investment Fly.io made, with reversals on prior skepticism from two different authors.
From Thomas Ptacek's 2025-03-27 post on tkdb:
"Most of that is down to OpenTelemetry and Honeycomb. From the moment a request hits our API server through the moment
tkdbresponds to it, oTel context propagation gives us a single narrative about what's happening. I was a skeptic about oTel. It's really, really expensive. And, not to put too fine a point on it, oTel really cruds up our code. Once, I was an '80% of the value of tracing, we can get from logs and metrics' person. But I was wrong." (Source: sources/2025-03-27-flyio-operationalizing-macaroons.)
From JP Phillips's 2025-02-12 exit interview:
"Without oTel, it'd be a disaster trying to troubleshoot the system. I'd have ragequit trying." (Source: sources/2025-02-12-flyio-the-exit-interview-jp-phillips.)
Load-bearing property: context propagation¶
The specific OTel feature Fly.io repeatedly names is context propagation — a trace ID and span context that travels with a request across process, service, and network boundaries, so that every span emitted by every service on the request path can be stitched into a single trace tree.
Fly.io's stack has at least these spans per request:
- Primary API (entry point, user-facing).
tkdbclient library (verification / sign / revoke).tkdbserver (Noise handshake, SQLite query, response).
Without propagation, each service would produce its own orphan logs — diagnosing a verification failure would require hand-correlation by timestamp. With propagation, the whole lineage is one trace in Honeycomb.
Trade-offs Fly.io names¶
- "Really, really expensive" — both in ingestion cost and infrastructure.
- "Cruds up our code" — instrumenting every call site is invasive.
- Counterweight: "worth the money to pay someone else to manage tracing data" (JP).
- Net judgment: "I was wrong" (Ptacek) — the 20% tracing adds over logs+metrics is load-bearing, not diminishing- returns.
Seen in¶
-
sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog — OTel as the protocol-portable boundary between agent instrumentation and lakehouse-resident storage. Databricks' 2026-05-22 launch ships OTLP/gRPC + REST direct ingest into Unity Catalog Delta tables via a managed serverless engine (Zerobus Ingest) — the post argues OTel's load-bearing property as "using the OTel standard to separate instrumentation from storage". Two protocol surfaces consumed: OTLP/gRPC for open-source collectors and framework SDKs (LangGraph, OpenAI, Anthropic) and REST for application-framework integration (MLflow). The architectural payoff named explicitly: "Q: Can I use this for agents running outside of Databricks? A: Yes, the agent can be running anywhere. In fact the support assistant agent example that was used for this blog is deployed locally." This makes OTel the client-side decoupling primitive that lets agents in any environment (customer VPC, developer laptop, third-party cloud) emit to the same governed lakehouse store. The post's "single-sink" framing — "Existing OLTP-compatible collectors can point directly to this endpoint via gRPC, entirely bypassing intermediate message buses like Kafka" — turns OTel into a drop-in re-pointable wire protocol. Six storage tables produced on the receiving side: see systems/uc-otel-trace-tables. Companion concept: concepts/instrumentation-storage-decoupling (this is what "OTel is the boundary" operationalises). Companion patterns: patterns/managed-otel-ingestion-direct-to-lakehouse (the canonical instance), patterns/telemetry-to-lakehouse (the broader pattern). First wiki disclosure of OTel as the inbound wire protocol for agent-side spans (vs the prior service-RPC and Kafka-record-header faces).
-
sources/2026-05-13-aws-streaming-cloudwatch-metrics-to-vpc-based-opentelemetry-collectors-using-lambda — canonical wiki disclosure of the OpenTelemetry collector's receiver / processor / exporter three-stage pipeline as the load-bearing internal abstraction that turns the collector into a vendor-neutral central hub. Verbatim: "The OpenTelemetry collector operates through three primary components that work together in a processing flow: Receivers accept data in specified formats (like Prometheus or OpenTelemetry Protocol (OTLP)) and translate it into OpenTelemetry's internal format; Processors manipulate and enrich the data as it flows through (filtering unnecessary data, batching for performance, transforming to mask sensitive information, or adding metadata like Kubernetes attributes); and Exporters send the processed data to destination backends such as Grafana Cloud, AWS X-Ray, Lightstep or Honeycomb." See concepts/opentelemetry-collector-three-stage-pipeline. First wiki naming of OTel collector in the VPC-self-hosted shape — running as a container on EC2 instances in a customer VPC, fronted by an internal NLB, ingesting CloudWatch Metric Streams (via Firehose + Lambda transform bridge) and fanning out to multiple observability backends. Cited exporter-destination examples: Grafana Cloud, AWS X-Ray, Lightstep, Honeycomb. AWS's open-source OTel distribution ( ADOT) explicitly named as the get-started on-ramp. The post's vendor-neutrality argument: "Future-proofing by avoiding vendor lock-in and enabling flexibility in choosing observability backends" + "The Apache 2.0 license is free and royalty-free." See patterns/cloudwatch-metric-stream-to-vpc-otel-collector.
-
sources/2025-03-27-flyio-operationalizing-macaroons — canonical wiki instance; Ptacek's "I was wrong" retraction.
- sources/2025-02-12-flyio-the-exit-interview-jp-phillips — JP Phillips's "I'd have ragequit" — engineering-side corroboration.
- sources/2025-06-24-redpanda-why-streaming-is-the-backbone-for-ai-native-data-platforms
— OpenTelemetry context propagation via Kafka record
headers — concrete carrier disclosure for the
streaming-boundary analogue of HTTP-header-based context
propagation. Verbatim: "Using best practices, such as Open
Telemetry tracing standard conventions
and propagating the tracing using record headers, is
particularly helpful as organizations adopt Open Telemetry
for all their observability data." Extends the Fly.io
application-RPC-boundary framing on this page to the
streaming-broker boundary: every Kafka record carries its
traceparent/tracestatein record headers so a trace can span producer → broker → consumer chains the same way it spans HTTP request → HTTP response chains.
Related¶
- systems/honeycomb — the Fly.io-chosen OTel backend.
- systems/aws-distro-for-opentelemetry — AWS's open-source OTel distribution.
- systems/amazon-cloudwatch-metric-streams — push source that natively emits OTel-format metrics.
- systems/zerobus-ingest — managed OTel receiver writing direct to UC-managed Delta tables (Databricks 2026-05-22).
- systems/uc-otel-trace-tables — the six UC-managed Delta tables/views populated by Zerobus Ingest.
- systems/mlflow-otel-tracing — MLflow's OTel-tracing surface (autolog + decorator).
- systems/mlflow — broader ML lifecycle platform that consumes OTel traces in the Databricks 2026-05-22 path.
- systems/unity-catalog — governance substrate for the storage-side endpoint.
- systems/delta-lake — physical storage format on the storage-side endpoint.
- concepts/context-propagation-otel — the specific feature that's the wiki takeaway.
- concepts/opentelemetry-collector-three-stage-pipeline — the receiver / processor / exporter abstraction inside the collector.
- concepts/push-vs-pull-monitoring — the monitoring data- flow shape OTel is push-friendly with.
- concepts/single-sink-telemetry-architecture — the architectural shape OTel-direct-ingest enables.
- concepts/instrumentation-storage-decoupling — what OTel achieves as a protocol-portable boundary.
- concepts/lakehouse-native-observability — broader storage-side posture.
- patterns/cloudwatch-metric-stream-to-vpc-otel-collector — composite VPC-private push-monitoring architecture.
- patterns/managed-otel-ingestion-direct-to-lakehouse — Databricks 2026-05-22 canonical pattern.
- patterns/telemetry-to-lakehouse — broader pattern this protocol enables.
- companies/flyio.
- companies/databricks.