PATTERN Cited by 1 source
Managed OTel ingestion direct to lakehouse¶
Managed OTel ingestion direct to lakehouse is the pattern of using a managed serverless OTLP receiver (gRPC + REST) as the only hop between OTel-instrumented clients and a governed columnar-storage destination (Delta Lake / Iceberg) — collapsing away intermediate brokers like Kafka and pushing operational complexity to the platform layer.
Mechanics¶
clients (OTel SDKs / collectors)
│
│ OTLP/gRPC (open-source collectors)
│ REST (framework SDKs like MLflow)
│
▼
managed OTel receiver ◄── serverless, vendor-operated
│
▼
lakehouse Delta tables (UC / Polaris-governed)
│
▼
downstream consumers (SQL, dashboards, ETL, eval)
The pattern's defining shape:
- One ingestion endpoint speaks OTLP/gRPC and HTTP REST.
- No broker (Kafka / Pulsar / Kinesis) between client and storage.
- Managed, not customer-operated.
- Storage is a governed lakehouse table, not an APM-vendor backend.
- Schema is OTel-native (spans / logs / metrics) plus optional vendor extensions.
Canonical instance: Databricks Zerobus → UC OTel Trace Tables (2026-05-22)¶
"Databricks supports ingesting OpenTelemetry (OTel) traces, logs, and metrics directly into Unity Catalog tables, using the OTel standard to separate instrumentation from storage. Databricks removes the operational complexity of traditional, multi-hop telemetry pipelines by providing a managed ingestion layer, transparently powered by Zerobus Ingest."
"With a 'single-sink' architecture, Zerobus Ingest simplifies observability by streaming data directly to the lakehouse. Existing OLTP-compatible collectors can point directly to this endpoint via gRPC, entirely bypassing intermediate message buses like Kafka. Zerobus Ingest acts as your high-throughput telemetry pipeline, handling ingestion and durability with zero infrastructure overhead."
— Source: sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog
Components:
- Receiver: Zerobus Ingest — managed, serverless, OTLP/gRPC + REST.
- Storage: UC OTel Trace Tables — six MLflow-derived UC Delta views.
- Instrumentation companion: MLflow OTel Tracing — the framework-side library.
- Throughput floor: 200 QPS (account-team-escalation for higher).
- Storage: unbounded; auto-liquid-clustered.
- Per-experiment trace cap removed (a constraint of the prior MLflow architecture).
When this pattern is the right shape¶
| Property | Why this pattern wins |
|---|---|
| Single canonical analytical destination | The lakehouse is already where analysts work; observability lands where the analytics is. |
| Lakehouse-resident governance is required | UC column masking, row filtering, RBAC, audit logs apply automatically to traces. |
| Long-retention is a requirement | Object-storage Delta is order-of-magnitude cheaper than SaaS APM retention. |
| Joining telemetry with business data is the question | Lakehouse co-location enables joins APM can't. |
| Operational simplicity matters | One managed receiver + one storage system to think about. |
| OTel-instrumented agents already exist | Drop-in re-point; no re-instrumentation cost. |
When this pattern is the wrong shape¶
- Real-time alerting is the primary use case. Lakehouse query latency is seconds-to-minutes; APM / Prometheus is sub-second. Pair with an APM sidecar for alerting.
- Multi-destination fan-out is needed. A single sink can't satisfy multiple consumers; broker-based architectures fan out cleanly.
- Burst absorption is critical. Brokers buffer; managed receivers may apply back-pressure that propagates to clients.
- Cross-organisation rendezvous. When telemetry from many independent producers must aggregate at a logical point, brokers are natural; lakehouse storage is not the same shape.
Composition with other patterns¶
- Specialisation of patterns/telemetry-to-lakehouse — Telemetry to Lakehouse says "observability data lives in lakehouse tables"; this pattern specifies how it gets there — the managed-OTel-receiver / no-broker shape.
- Inverse of patterns/streaming-broker-as-lakehouse-bronze-sink — that pattern uses Kafka / similar as the bronze-tier sink before Delta. This pattern eliminates the broker.
- Composes with patterns/component-level-latency-from-otel-spans — once spans are in UC, per-tool latency dashboards become a SQL exercise.
- Composes with patterns/bootstrap-eval-dataset-from-production-traces — durable lakehouse storage is what makes prod traces materializable as eval inputs.
- Composes with patterns/inference-payload-table-for-audit — sibling pattern for full-payload model-call audit at the AI-gateway choke point. Both land observability data in UC; different granularity.
Caveats¶
- Vendor lock-on-lakehouse. Avoiding APM-vendor lock-in trades it for lakehouse-vendor lock-in. Open table formats (Delta + Iceberg) mitigate, but the managed receiver itself (Zerobus) is vendor-specific.
- Single-sink claim is architectural marketing. The Kafka-bypass argument is plausible but rarely benchmarked.
- Throughput floor is modest. 200 QPS on the canonical instance — high-traffic agent fleets may need account-team escalation.
- Latency from emit to query is not characterised. The post does not state SLO numbers for Zerobus ingest end-to-end.
- Sample-and-summarise alternatives may be cheaper when full trace retention isn't required. The pattern's economics assume the customer wants full lakehouse-resident retention; if they only need 1-week APM-style traces, traditional shapes may win on simplicity.
- Managed receiver internals are opaque. Durability semantics, partition strategy, back-pressure behaviour are vendor-internal details.
Seen in¶
- sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog — canonical instance: Zerobus Ingest + UC OTel Trace Tables. The "single-sink" term, the Kafka-bypass argument, the OTLP/gRPC + REST dual-protocol surface, and the "agent can be running anywhere" portability all reside here.
Related¶
- patterns/telemetry-to-lakehouse — generalised pattern.
- patterns/component-level-latency-from-otel-spans — composing query pattern.
- patterns/bootstrap-eval-dataset-from-production-traces — composing eval pattern.
- patterns/inference-payload-table-for-audit — sibling AI-gateway audit pattern.
- patterns/streaming-broker-as-lakehouse-bronze-sink — inverse pattern (broker-fronted).
- concepts/single-sink-telemetry-architecture — structural concept.
- concepts/instrumentation-storage-decoupling — client-side concept.
- concepts/lakehouse-native-observability — storage-side posture.
- systems/zerobus-ingest — canonical receiver.
- systems/uc-otel-trace-tables — canonical storage.
- systems/mlflow-otel-tracing — instrumentation companion.
- systems/opentelemetry — wire protocol.
- systems/unity-catalog — governance substrate.
- systems/delta-lake — storage format.
- companies/databricks