SYSTEM Cited by 2 sources

Zerobus Ingest¶

Zerobus Ingest is Databricks' managed serverless ingestion engine that receives OpenTelemetry (and other) telemetry traffic and writes it directly to Unity Catalog-managed Delta tables — explicitly named in the 2026-05-22 OTel-tracing launch as the "single-sink" substrate that "streams data directly to the lakehouse" and "entirely bypass[es] intermediate message buses like Kafka".

Definition (from the source)¶

"Databricks removes the operational complexity of traditional, multi-hop telemetry pipelines by providing a managed ingestion layer, transparently powered by Zerobus Ingest. Zerobus Ingest acts as a fully managed, serverless ingestion engine that natively supports standard OpenTelemetry protocols (OTLP) via gRPC for open-source collectors, while its REST API capabilities enable seamless integration with application frameworks like MLflow." — Source: sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog

Protocol surface¶

Surface	Used by	Direction
OTLP / gRPC	Open-source OTel collectors, framework SDKs (LangGraph, OpenAI SDK, Anthropic SDK, etc.)	Direct point-to-endpoint, no intermediate proxy
REST API	Application frameworks like MLflow	Same endpoint surface, HTTP-friendly framing

The dual-protocol surface means "any OTel-compatible client can export traces to this endpoint, including popular AI agent frameworks across many programming languages."

Single-sink architecture¶

The defining shape:

clients (OTel SDKs / collectors)
        │
        ▼
   Zerobus Ingest  ◄── managed, serverless
        │
        ▼
   UC-managed Delta tables  ◄── governed, durable, queryable
        │
        ▼
   downstream consumers (MLflow UI / SQL / Genie / ETL)

vs the conventional multi-hop telemetry pipeline:

clients ──► Kafka / Pulsar / Kinesis ──► consumer ──► storage

The post names "intermediate message buses like Kafka" as the architecture being collapsed away. The structural arguments:

Fewer hops: latency / failure-mode surface area shrinks; "handling ingestion and durability with zero infrastructure overhead".
One schema boundary: clients emit OTel; Zerobus writes Delta. No intermediate format translation.
Operational simplification: no broker to size, scale, secure, patch, monitor.
Bypasses re-architecture: "Existing OLTP-compatible collectors can point directly to this endpoint via gRPC, entirely bypassing intermediate message buses like Kafka" — drop-in for teams that already have OTel pipelines and want to redirect the sink.

This is the canonical wiki instance of concepts/single-sink-telemetry-architecture and patterns/managed-otel-ingestion-direct-to-lakehouse.

Throughput / scale (disclosed)¶

Starting throughput: 200 QPS ingestion limit per the 2026-05-22 FAQ.
Storage: "There is no limit on storage."
Higher throughput: available "by reaching out to your Databricks account team" (no public ceiling disclosed).
MLflow trace cap: "Previous limits on traces per experiment are no longer applicable" — Zerobus + UC tables eliminate the per-experiment retention bound MLflow historically imposed.

What it writes (output schema surface)¶

Zerobus Ingest is the producer; the consumer schema is six MLflow-derived UC tables/views — see systems/uc-otel-trace-tables for the full enumeration:

<prefix>_otel_spans
<prefix>_otel_logs
<prefix>_otel_metrics
<prefix>_otel_annotations
<prefix>_trace_unified
<prefix>_trace_metadata

Tables are auto-liquid-clustered post the latest product update. The source recommends a materialized view on top of the derived views "to maintain query performance" at "larger trace volumes".

Why it's load-bearing for the OTel-on-UC story¶

Without Zerobus, "OTel trace ingestion to Delta" would require either:

A Spark Structured Streaming job consuming an OTel collector → writing Delta (operational complexity, latency, infrastructure to own).
An intermediate Kafka topic between the collector and a Delta-Sink connector (the architecture the post explicitly bypasses).

Zerobus is the managed glue that makes the lakehouse a viable direct OTel destination at the engineering-economics that compete with SaaS observability vendors. It's the load-bearing system for the patterns/telemetry-to-lakehouse pattern when applied to high-cardinality, high-throughput agent traffic.

Relationship to other Databricks systems¶

Producer for systems/uc-otel-trace-tables — the consumer schema surface.
Wire-protocol-compatible with OpenTelemetry — accepts standard OTLP/gRPC.
Used by MLflow's OTel tracing surface via REST API for the framework integration path.
Sibling substrate to Inference Tables — both land observability data in UC Delta, but Inference Tables capture full request/response payloads at the Unity AI Gateway choke point (one row = one model call), while Zerobus + OTel tables capture agent-side execution spans (one row = one span in a trace). Different granularity, same governance substrate.

Internal architecture (disclosed June 2026)¶

The 2026-06-11 benchmark post (Source: sources/2026-06-11-databricks-ingesting-the-milky-way-petabyte-scale-with-zerobus-ingest) reveals three key internal design decisions previously opaque:

Dynamic partitioning with stream-level ordering¶

Traditional message-bus architectures couple parallelism with ordering at the partition level. Zerobus Ingest decouples them:

Ordering guarantee lives at the stream connection, not the partition. Each producer's gRPC connection is its own logical identity; data arrives in order for the lifetime of that connection regardless of which pod processes it.
Hot routing: if a pod is running hot, new incoming streams route to a different pod. The producer is unaware; ordering is unaffected.
True autoscaling: pods can be added on demand spikes and removed when demand drops. Existing streams drain gracefully; new streams stop routing to shrinking pods. This eliminates the "provision for peak, carry forever" anti-pattern of static Kafka partitions.
This is the canonical wiki instance of patterns/stream-connection-as-ordering-unit.

Zeroparser: zero-copy protobuf decoder¶

Zerobus needs to decode arbitrary user-provided protobuf schemas at runtime (dynamic descriptors) — codegen is impossible. Standard reflection-based decoders are slow (object graph in memory, many small allocations).

Zeroparser bridges the gap: single-pass parsing with zero memory allocations, achieving ~1 GB/s protobuf parsing per CPU core with dynamic descriptors. Outperforms industry-standard codegen implementations in benchmarks.

Built in Rust — lifetime system guarantees compile-time safety while keeping raw wire bytes under exclusive network ownership (zero copies).
Open-source: github.com/databricks/zerobus-sdk/.../zeroparser.
Canonical instance of patterns/zero-copy-protobuf-decoding and concepts/zero-copy-parsing.

Latency-optimized Write-Ahead Log¶

Zerobus implements a WAL for durability before lakehouse publish:

Messages are durable before ack — the classic WAL commit-before-ack invariant.
Acks are offset-based (highest committed offset on the stream), not per-record — an async ack loop via gRPC bidirectional streaming.
Clients purge in-flight buffers up to the acknowledged offset.
Delta Kernel Rust handles the core logic of writing from WAL to Delta tables.
Canonical instance of patterns/wal-before-lakehouse-publish.

Benchmark results (NEOWISE dataset)¶

Metric	Value
Sustained throughput	12 GB/s (11.8 GB/s proto2 wire) to a single table
Row throughput	12,000,000 rows/sec
Total rows ingested	1.04 trillion
Duration	~25 hours (incl. 1h ramp)
Concurrent streams	2,048 (one per Locust worker pod)
Infrastructure	Kubernetes, 1.5 cores / 2 GiB per worker pod

The benchmark used NASA's NEOWISE dataset (200 billion data points over 11 years) with Locust to emulate real-world fan-in patterns. A single powerful host cannot stress-test Zerobus because it would saturate its own bandwidth first — the service scales with concurrent stream count.

Caveats / what's not disclosed¶

Pod-level durability semantics (sync vs async WAL fsync), replication topology, failure modes on pod crash mid-stream, and back-pressure behaviour on slow Delta writes are not fully disclosed (partially addressed by the 2026-06-11 WAL disclosure but without fsync-level detail).
Single-sink claim is architectural marketing, not a benchmark vs Kafka-fronted equivalents.
Multi-destination fan-out is not addressed — if traces need to land in both UC and a SaaS APM for real-time alerting, the post does not describe the topology.
200 QPS starting throughput (telemetry use-case) vs 12 GB/s benchmark throughput — the gap between the OTel-product starting quota and the architectural ceiling is large; likely different product tiers / account-level gating.
No latency SLO for end-to-end (client emit → Delta queryable) is named beyond "queryable in seconds".
Kafka comparison is structural, not benchmarked — architectural superiority is argued (no static partitions) but no head-to-head benchmark is presented.

Seen in¶

sources/2026-05-22-databricks-observability-any-agent-anywhere-otel-unity-catalog — first wiki disclosure; named explicitly as the "managed ingestion layer, transparently powered by Zerobus Ingest"; protocol surface (OTLP/gRPC + REST), single-sink shape, and Kafka-bypass architectural argument all attributed to this engine.
sources/2026-06-11-databricks-ingesting-the-milky-way-petabyte-scale-with-zerobus-ingest — deep architecture disclosure: dynamic partitioning internals (stream-connection-level ordering, hot routing, true autoscaling), Zeroparser zero-copy protobuf decoder (~1 GB/s/core, Rust, OSS), latency-optimized WAL with async ack loop, Delta Kernel Rust integration, and petabyte-scale benchmark (12 GB/s sustained, 1.04T rows, 2,048 streams, 24h).

systems/uc-otel-trace-tables — output schema surface.
systems/mlflow-otel-tracing — REST-API consumer (MLflow integration path).
systems/opentelemetry — wire protocol.
systems/unity-catalog — governance substrate.
systems/delta-lake — physical storage format.
systems/mlflow — companion ML lifecycle platform.
systems/inference-tables — sibling lakehouse-resident audit substrate.
concepts/single-sink-telemetry-architecture — the structural shape this system instantiates.
concepts/instrumentation-storage-decoupling — OTel-as-boundary; Zerobus is the storage-side endpoint.
concepts/lakehouse-native-observability — the broader thesis this enables.
patterns/managed-otel-ingestion-direct-to-lakehouse — the canonical pattern instance.
patterns/telemetry-to-lakehouse — the broader pattern Zerobus operationalises.
companies/databricks