SYSTEM Cited by 2 sources
Zerobus Ingest¶
Zerobus Ingest is Databricks' managed serverless ingestion engine that receives OpenTelemetry (and other) telemetry traffic and writes it directly to Unity Catalog-managed Delta tables — explicitly named in the 2026-05-22 OTel-tracing launch as the "single-sink" substrate that "streams data directly to the lakehouse" and "entirely bypass[es] intermediate message buses like Kafka".
Definition (from the source)¶
"Databricks removes the operational complexity of traditional, multi-hop telemetry pipelines by providing a managed ingestion layer, transparently powered by Zerobus Ingest. Zerobus Ingest acts as a fully managed, serverless ingestion engine that natively supports standard OpenTelemetry protocols (OTLP) via gRPC for open-source collectors, while its REST API capabilities enable seamless integration with application frameworks like MLflow." — Source: sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog
Protocol surface¶
| Surface | Used by | Direction |
|---|---|---|
| OTLP / gRPC | Open-source OTel collectors, framework SDKs (LangGraph, OpenAI SDK, Anthropic SDK, etc.) | Direct point-to-endpoint, no intermediate proxy |
| REST API | Application frameworks like MLflow | Same endpoint surface, HTTP-friendly framing |
The dual-protocol surface means "any OTel-compatible client can export traces to this endpoint, including popular AI agent frameworks across many programming languages."
Single-sink architecture¶
The defining shape:
clients (OTel SDKs / collectors)
│
▼
Zerobus Ingest ◄── managed, serverless
│
▼
UC-managed Delta tables ◄── governed, durable, queryable
│
▼
downstream consumers (MLflow UI / SQL / Genie / ETL)
vs the conventional multi-hop telemetry pipeline:
The post names "intermediate message buses like Kafka" as the architecture being collapsed away. The structural arguments:
- Fewer hops: latency / failure-mode surface area shrinks; "handling ingestion and durability with zero infrastructure overhead".
- One schema boundary: clients emit OTel; Zerobus writes Delta. No intermediate format translation.
- Operational simplification: no broker to size, scale, secure, patch, monitor.
- Bypasses re-architecture: "Existing OLTP-compatible collectors can point directly to this endpoint via gRPC, entirely bypassing intermediate message buses like Kafka" — drop-in for teams that already have OTel pipelines and want to redirect the sink.
This is the canonical wiki instance of concepts/single-sink-telemetry-architecture and patterns/managed-otel-ingestion-direct-to-lakehouse.
Throughput / scale (disclosed)¶
- Starting throughput: 200 QPS ingestion limit per the 2026-05-22 FAQ.
- Storage: "There is no limit on storage."
- Higher throughput: available "by reaching out to your Databricks account team" (no public ceiling disclosed).
- MLflow trace cap: "Previous limits on traces per experiment are no longer applicable" — Zerobus + UC tables eliminate the per-experiment retention bound MLflow historically imposed.
What it writes (output schema surface)¶
Zerobus Ingest is the producer; the consumer schema is six MLflow-derived UC tables/views — see systems/uc-otel-trace-tables for the full enumeration:
<prefix>_otel_spans<prefix>_otel_logs<prefix>_otel_metrics<prefix>_otel_annotations<prefix>_trace_unified<prefix>_trace_metadata
Tables are auto-liquid-clustered post the latest product update. The source recommends a materialized view on top of the derived views "to maintain query performance" at "larger trace volumes".
Why it's load-bearing for the OTel-on-UC story¶
Without Zerobus, "OTel trace ingestion to Delta" would require either:
- A Spark Structured Streaming job consuming an OTel collector → writing Delta (operational complexity, latency, infrastructure to own).
- An intermediate Kafka topic between the collector and a Delta-Sink connector (the architecture the post explicitly bypasses).
Zerobus is the managed glue that makes the lakehouse a viable direct OTel destination at the engineering-economics that compete with SaaS observability vendors. It's the load-bearing system for the patterns/telemetry-to-lakehouse pattern when applied to high-cardinality, high-throughput agent traffic.
Relationship to other Databricks systems¶
- Producer for systems/uc-otel-trace-tables — the consumer schema surface.
- Wire-protocol-compatible with OpenTelemetry — accepts standard OTLP/gRPC.
- Used by MLflow's OTel tracing surface via REST API for the framework integration path.
- Sibling substrate to Inference Tables — both land observability data in UC Delta, but Inference Tables capture full request/response payloads at the Unity AI Gateway choke point (one row = one model call), while Zerobus + OTel tables capture agent-side execution spans (one row = one span in a trace). Different granularity, same governance substrate.
Internal architecture (disclosed June 2026)¶
The 2026-06-11 benchmark post (Source: sources/2026-06-11-databricks-ingesting-the-milky-way-petabyte-scale-with-zerobus-ingest) reveals three key internal design decisions previously opaque:
Dynamic partitioning with stream-level ordering¶
Traditional message-bus architectures couple parallelism with ordering at the partition level. Zerobus Ingest decouples them:
- Ordering guarantee lives at the stream connection, not the partition. Each producer's gRPC connection is its own logical identity; data arrives in order for the lifetime of that connection regardless of which pod processes it.
- Hot routing: if a pod is running hot, new incoming streams route to a different pod. The producer is unaware; ordering is unaffected.
- True autoscaling: pods can be added on demand spikes and removed when demand drops. Existing streams drain gracefully; new streams stop routing to shrinking pods. This eliminates the "provision for peak, carry forever" anti-pattern of static Kafka partitions.
- This is the canonical wiki instance of patterns/stream-connection-as-ordering-unit.
Zeroparser: zero-copy protobuf decoder¶
Zerobus needs to decode arbitrary user-provided protobuf schemas at runtime (dynamic descriptors) — codegen is impossible. Standard reflection-based decoders are slow (object graph in memory, many small allocations).
Zeroparser bridges the gap: single-pass parsing with zero memory allocations, achieving ~1 GB/s protobuf parsing per CPU core with dynamic descriptors. Outperforms industry-standard codegen implementations in benchmarks.
- Built in Rust — lifetime system guarantees compile-time safety while keeping raw wire bytes under exclusive network ownership (zero copies).
- Open-source: github.com/databricks/zerobus-sdk/.../zeroparser.
- Canonical instance of patterns/zero-copy-protobuf-decoding and concepts/zero-copy-parsing.
Latency-optimized Write-Ahead Log¶
Zerobus implements a WAL for durability before lakehouse publish:
- Messages are durable before ack — the classic WAL commit-before-ack invariant.
- Acks are offset-based (highest committed offset on the stream), not per-record — an async ack loop via gRPC bidirectional streaming.
- Clients purge in-flight buffers up to the acknowledged offset.
- Delta Kernel Rust handles the core logic of writing from WAL to Delta tables.
- Canonical instance of patterns/wal-before-lakehouse-publish.
Benchmark results (NEOWISE dataset)¶
| Metric | Value |
|---|---|
| Sustained throughput | 12 GB/s (11.8 GB/s proto2 wire) to a single table |
| Row throughput | 12,000,000 rows/sec |
| Total rows ingested | 1.04 trillion |
| Duration | ~25 hours (incl. 1h ramp) |
| Concurrent streams | 2,048 (one per Locust worker pod) |
| Infrastructure | Kubernetes, 1.5 cores / 2 GiB per worker pod |
The benchmark used NASA's NEOWISE dataset (200 billion data points over 11 years) with Locust to emulate real-world fan-in patterns. A single powerful host cannot stress-test Zerobus because it would saturate its own bandwidth first — the service scales with concurrent stream count.
Caveats / what's not disclosed¶
- Pod-level durability semantics (sync vs async WAL fsync), replication topology, failure modes on pod crash mid-stream, and back-pressure behaviour on slow Delta writes are not fully disclosed (partially addressed by the 2026-06-11 WAL disclosure but without fsync-level detail).
- Single-sink claim is architectural marketing, not a benchmark vs Kafka-fronted equivalents.
- Multi-destination fan-out is not addressed — if traces need to land in both UC and a SaaS APM for real-time alerting, the post does not describe the topology.
- 200 QPS starting throughput (telemetry use-case) vs 12 GB/s benchmark throughput — the gap between the OTel-product starting quota and the architectural ceiling is large; likely different product tiers / account-level gating.
- No latency SLO for end-to-end (client emit → Delta queryable) is named beyond "queryable in seconds".
- Kafka comparison is structural, not benchmarked — architectural superiority is argued (no static partitions) but no head-to-head benchmark is presented.
Seen in¶
- sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog — first wiki disclosure; named explicitly as the "managed ingestion layer, transparently powered by Zerobus Ingest"; protocol surface (OTLP/gRPC + REST), single-sink shape, and Kafka-bypass architectural argument all attributed to this engine.
- sources/2026-06-11-databricks-ingesting-the-milky-way-petabyte-scale-with-zerobus-ingest — deep architecture disclosure: dynamic partitioning internals (stream-connection-level ordering, hot routing, true autoscaling), Zeroparser zero-copy protobuf decoder (~1 GB/s/core, Rust, OSS), latency-optimized WAL with async ack loop, Delta Kernel Rust integration, and petabyte-scale benchmark (12 GB/s sustained, 1.04T rows, 2,048 streams, 24h).
Related¶
- systems/uc-otel-trace-tables — output schema surface.
- systems/mlflow-otel-tracing — REST-API consumer (MLflow integration path).
- systems/opentelemetry — wire protocol.
- systems/unity-catalog — governance substrate.
- systems/delta-lake — physical storage format.
- systems/mlflow — companion ML lifecycle platform.
- systems/inference-tables — sibling lakehouse-resident audit substrate.
- concepts/single-sink-telemetry-architecture — the structural shape this system instantiates.
- concepts/instrumentation-storage-decoupling — OTel-as-boundary; Zerobus is the storage-side endpoint.
- concepts/lakehouse-native-observability — the broader thesis this enables.
- patterns/managed-otel-ingestion-direct-to-lakehouse — the canonical pattern instance.
- patterns/telemetry-to-lakehouse — the broader pattern Zerobus operationalises.
- companies/databricks