Skip to content

PATTERN Cited by 1 source

Telemetry to Lakehouse

Telemetry to Lakehouse is the pattern of landing operational / tool / agent telemetry directly into governed open-table-format tables (typically Delta Lake or Iceberg) instead of an APM / observability-vendor sidecar — so the telemetry becomes a first-class Lakehouse dataset joinable with business data.

Mechanics

  • Clients emit OpenTelemetry metrics + traces (standard protocol, avoids vendor lock).
  • A managed ingestion pipeline writes OTel data into Lakehouse-resident tables (Unity-Catalog-managed Delta tables in the Databricks instance).
  • Tables are governed under the same catalog / IAM / audit posture as the rest of the enterprise's data — one policy surface for telemetry + business data.
  • Analysts query telemetry with the same SQL + dashboard tooling as HR, finance, CRM data.

Canonical instance: Unity AI Gateway (2026-04-17)

Databricks' coding-agent post states the design explicitly: "With our OpenTelemetry ingestion, coding tool metrics and traces are automatically centralized to Unity Catalog-managed Delta tables." (Source: sources/2026-04-17-databricks-governing-coding-agent-sprawl-with-unity-ai-gateway.)

Three use cases named in the post:

  • Track adoption per org: "Join AI Gateway metrics with Workday to map GenAI adoption by department, region, or seniority, helping identify where to target enablement."
  • Quantify developer velocity: "A 20% increase in token usage per developer drove a 15% reduction in pull request cycle time, directly linking AI tool usage to increased developer velocity."
  • Proactive capacity planning: "Monitor users hitting rate limits to data-justify securing additional capacity or dedicated throughput before productivity is throttled."

Each of these requires joining gateway telemetry with a non-telemetry dataset (HR org chart, CI/CD PR metrics, capacity-cost tables). That join is the point of the pattern.

Why Lakehouse instead of APM

  • APM vendors silo telemetry. Datadog / New Relic / Grafana Cloud store telemetry in their own backend; joining with Workday or Jira is ETL-out-of-APM, which breaks the freshness
  • governance story.
  • Business-value questions require business-data joins. "Are AI tools changing PR cycle time?" is a question about PR cycle time, not about LLM latency. APM can't answer it.
  • Governance is unified. Same RBAC, same retention policies, same audit log covers telemetry + revenue data + HR data. No "observability data is special" escape hatch.

Costs / caveats

  • Latency vs APM. Lakehouse query latency is seconds-to- minutes, not APM's sub-second. This pattern is for analytical telemetry questions, not real-time alerting. Real-time alerting typically still lives in an APM / Prometheus / Carnaval sidecar.
  • Schema discipline required. Telemetry evolves fast; making it useful to analysts means stable columns / enums.
  • Vendor-lock-on-lakehouse tradeoff. Avoids APM-vendor lock-in; trades it for lakehouse-vendor lock-in. Open table formats (Delta / Iceberg) mitigate.
  • OTel as the wire-protocol boundary keeps the pattern Lakehouse-vendor-portable: Databricks today, could be Snowflake or Trino + Iceberg tomorrow.

Relation to other patterns

Seen in

Last updated · 200 distilled / 1,178 read