Ingesting the Milky Way: Petabyte-Scale with Zerobus Ingest¶
Summary¶
Databricks discloses the internal architecture of Zerobus Ingest — their fully managed, serverless, push-based streaming ingestion service that writes directly into Delta tables governed by Unity Catalog. The post details three critical design decisions that enabled petabyte-scale ingestion: (1) dynamic partitioning with stream-connection-level ordering guarantees instead of static Kafka-style partition topology, (2) a custom zero-copy protobuf decoder ("Zeroparser") achieving ~1 GB/s per CPU core with dynamic descriptors, and (3) a latency-optimized write-ahead log for durability before lakehouse publish. The benchmark ingests NASA's NEOWISE dataset (200 billion data points / 1 PB) in under 24 hours at a sustained 12 GB/s to a single table from 2,048 concurrent streams.
Key takeaways¶
-
Stream-connection-level ordering replaces partition-level ordering. Zerobus decouples ordering from partitions — each producer's stream connection is the ordering unit. This eliminates the static partition problem where adding/removing partitions breaks consumer ordering guarantees (Source: §Autoscaling achieved through dynamic partitioning).
-
True autoscaling via hot routing. Because ordering is per-stream (not per-partition), pods can be added/removed dynamically. New streams route away from hot pods; draining streams finish gracefully on shrinking pods. The system achieves elastic compute utilization without the "provision for peak, carry forever" anti-pattern of Kafka (Source: §Hot routing and true autoscaling).
-
Zero-copy protobuf decoding at ~1 GB/s per core. Zeroparser is a custom Rust-based protobuf decoder that accepts dynamic descriptors at runtime (like reflection) but achieves codegen-level performance by doing single-pass parsing with zero memory allocations. Rust's lifetime system guarantees compile-time safety while keeping raw wire bytes under exclusive network ownership (Source: §Zero-copy high-performance data handling).
-
Zeroparser outperforms industry-standard codegen implementations despite being in the "dynamic" category — benchmarked against the NEOWISE schema on single-core parsing (Source: §Results show that Zeroparser…).
-
WAL provides durability guarantee with low-latency handoff. Zerobus implements a latency-optimized WAL; messages are durable before ack is sent to client. Rather than per-record acks, the server returns the highest committed offset on the stream — an async ack loop that keeps clients lean (Source: §Write Ahead Log).
-
gRPC bidirectional streaming enables the async ack pattern: one channel for sending messages, another for receiving offset acknowledgements. Clients purge in-flight buffers up to the acknowledged offset (Source: §This async design is key for clients…).
-
Delta Kernel Rust is used for the core logic of writing to Delta tables from the WAL (Source: §Delta Kernel Rust is then used for the core logic for writing to Delta).
-
Sustained 12 GB/s / 12M rows/sec to a single table for 24 hours from 2,048 concurrent Locust workers, ingesting 1.04 trillion records total. Each worker runs as a separate Kubernetes pod with 1.5 cores / 2 GiB memory (Source: §The results).
-
Fan-in pattern is the correct stress model. A single powerful host saturates its own bandwidth before pressuring Zerobus — the service scales with concurrent stream count, not single-client throughput. Locust on Kubernetes with 2,048 pods emulates real-world fan-in (Source: §Why Locust? The Fan-In Problem).
-
No infrastructure to provision. Create a table and push data. No partitions, no brokers, no connector pipelines. Data lands in the lakehouse queryable in seconds (Source: §Introduction).
Operational numbers¶
| Metric | Value |
|---|---|
| Sustained throughput (bytes) | 12 GB/s (11.8 GB/s proto2 wire) |
| Sustained throughput (rows) | 12,000,000 rows/sec |
| Total rows ingested | 1.04 trillion |
| Test duration | ~25 hours (incl. 1h ramp) |
| Concurrent streams | 2,048 |
| Worker spec | 1.5 cores / 2 GiB / 10 GiB ephemeral |
| Zeroparser throughput | ~1 GB/s per CPU core |
| Message format | Protocol Buffer 2 (proto2) binary |
| In-flight records per stream | 50,000 (max) |
| Spawn rate | 0.5 users/sec |
Architecture diagram (textual)¶
2,048 Locust workers (K8s pods)
│ gRPC bidirectional streams
▼
Zerobus Ingest (serverless, dynamic pod pool)
│ hot-routing: streams → pods (heuristic-based)
│ per-stream ordering guarantee
▼
Write-Ahead Log (latency-optimized)
│ async ack loop (highest committed offset)
▼
Delta Kernel Rust → Delta tables (Unity Catalog governed)
Supported formats¶
| Format | Use case | Performance |
|---|---|---|
| Protobuf | Generic, fast record-by-record | Fastest |
| Arrow | Fast batch ingestion | Fast |
| JSON | Batch or row-by-row; convenient | Slower than protobuf/Arrow |
Caveats¶
- Tier-3 source (Databricks Blog) — product/benchmark marketing mixed with real architecture. The 12 GB/s number is benchmark-optimal with 2,048 coordinated workers; real-world fan-in from heterogeneous producers may behave differently.
- Internals still partially opaque. Pod-level durability semantics (sync vs async WAL fsync), replication topology, failure modes on pod crash mid-stream, and back-pressure behaviour on slow Delta writes are not disclosed.
- No latency SLO disclosed. "Queryable in seconds" is the only end-to-end claim.
- Kafka comparison is structural, not benchmarked. The post argues architectural superiority (no static partitions) but does not benchmark against a Kafka cluster under equivalent load.
- Open-source scope is limited. Zeroparser is OSS (in the Zerobus SDK), but the service itself is proprietary managed infrastructure.
Source¶
- Original: https://www.databricks.com/blog/ingesting-milky-way-petabyte-scale-zerobus-ingest
- Raw markdown:
raw/databricks/2026-06-11-ingesting-the-milky-way-petabyte-scale-with-zerobus-ingest-b738650c.md
Related¶
- systems/zerobus-ingest — the system described
- systems/kafka — the architectural predecessor being displaced
- systems/delta-lake — storage target
- systems/delta-kernel — Rust library used for Delta write logic
- systems/unity-catalog — governance layer over target tables
- concepts/wal-write-ahead-logging — durability primitive used
- concepts/dynamic-partition-splitting — related concept (runtime partition remediation)
- patterns/stream-connection-as-ordering-unit — new pattern introduced
- patterns/zero-copy-protobuf-decoding — new pattern introduced
- concepts/zero-copy-parsing — concept instantiated by Zeroparser
- concepts/fan-in-ingestion — stress-testing model
- companies/databricks