PATTERN Cited by 1 source
Zero-copy protobuf decoding¶
Definition¶
Zero-copy protobuf decoding is the pattern of parsing Protocol Buffer messages by traversing the wire-format bytes in a single pass without allocating intermediate memory objects or copying data out of the network buffer. It combines the flexibility of runtime reflection (dynamic descriptors, no compile-time codegen required) with the performance of generated code (zero allocations, no object-graph construction).
The problem it solves¶
Standard protobuf decoders force a choice:
| Approach | Pros | Cons |
|---|---|---|
| Code generation (codegen) | Fast, zero-overhead at runtime | Requires descriptors at compile time; cannot handle arbitrary user schemas at runtime |
| Runtime reflection | Fully dynamic, accepts any schema | Slow — builds object graph in memory, many small allocations |
For services that receive arbitrary user-defined schemas at runtime (e.g., a managed ingestion service accepting any producer's data format), codegen is impossible and reflection is too slow at high throughput.
Mechanism (Zeroparser instantiation)¶
Databricks' Zeroparser bridges this gap:
- Single-pass parsing: traverses wire bytes exactly once.
- Zero memory allocations: no intermediate objects; fields are referenced directly into the network-owned buffer.
- Rust lifetime system: compile-time guarantee that raw wire bytes remain under exclusive network ownership during parsing — safety without runtime overhead.
- Dynamic descriptor support: schemas provided at runtime, yet performance matches or exceeds codegen.
Result: ~1 GB/s protobuf parsing per CPU core with complex schemas (NEOWISE benchmark: nested fields, repeated fields, mixed types).
Trade-offs¶
| Advantage | Cost |
|---|---|
| Codegen-level throughput with runtime flexibility | Implementation complexity (requires language with ownership semantics like Rust) |
| Zero allocation pressure / GC-free | Parsed data only valid while network buffer is live |
| Single-pass, cache-friendly | Cannot do random-access field lookup during parse |
Seen in¶
- systems/zerobus-ingest — Zeroparser, open-source at github.com/databricks/zerobus-sdk/.../zeroparser; outperforms industry-standard codegen implementations in benchmarks (Source: sources/2026-06-11-databricks-ingesting-the-milky-way-petabyte-scale-with-zerobus-ingest)
Related¶
- concepts/zero-copy-parsing — the general concept
- systems/zerobus-ingest — canonical production deployment
- patterns/stream-connection-as-ordering-unit — sibling pattern in the same system