PATTERN Cited by 2 sources
Binary format for broker throughput¶
Pattern¶
When raw streaming-broker throughput is the constraint, prefer binary encodings (AVRO, Protobuf) over text encodings (JSON) for on-wire payloads. The wire-size reduction translates directly to broker-throughput uplift because the broker's per-record cost is partly fixed and partly proportional to bytes-on-wire; smaller records move the variable component down.
Canonical production quantification: ~20% throughput uplift from AVRO vs JSON at 1 KB payload, randomised content (designed to neutralise compression effects). Disclosed in Redpanda's 14.5 GB/s Redpanda → Snowflake benchmark.
"Using a binary format like AVRO showed ~20% throughput improvement over a textual format like JSON." (Source: sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming)
Why it works¶
Two layered effects compound:
- Smaller uncompressed wire size. JSON has structural overhead — field names repeated per record, whitespace, delimiter characters — that binary encodings lack. AVRO records carry no inline field names; schema is shipped separately via a schema registry.
- Higher effective compression ratio. Even when compression is enabled, binary encodings typically compress to smaller final sizes than the equivalent text payload. On randomised content (as in the benchmark — designed to defeat compression effects), the uncompressed-size delta dominates.
The benchmark's randomised-content methodology isolates the encoding-format effect from compression-effectiveness variation that would otherwise confound the comparison on real-world payloads.
Composes with client-side compression¶
The 20% uplift is a standalone effect from the encoding format. Composing with client-side compression — where the producer compresses the batch before wire transit, so the broker sees already-compressed bytes — is compositional and delivers further throughput gains.
In practice, production Kafka/Redpanda deployments typically pick both: binary encoding (AVRO or Protobuf) and client-side compression (lz4, zstd).
Trade-offs¶
- Schema registry dependency. AVRO records don't carry field names inline, so the consumer needs the schema to decode. This requires a schema registry or equivalent out-of-band schema-distribution mechanism.
- Debuggability. A JSON record is human-readable on console; AVRO is not. Debug-time instrumentation (decoder tooling, schema lookups) is required.
- Schema-evolution discipline. Binary formats have stricter rules for schema changes (AVRO: name-preserving field adds with defaults; Protobuf: tag-number preservation). Text formats are more forgiving but break silently.
- Codec compatibility with destination. Not all sinks natively consume the broker's binary format — e.g., the Snowpipe-Streaming destination may decode AVRO in the ingest connector rather than pass-through. The throughput gain is on the producer → broker → consumer hops; destination-side decoding is additional work.
Related findings in the same benchmark¶
The AVRO-over-JSON finding was one of four tuning insights from the 14.5 GB/s Redpanda → Snowflake run:
- ~20% uplift from AVRO — this pattern.
- Count-based batching over byte-size — less trigger-evaluation CPU overhead.
build_paralellismtuned to (cores − small reserve) — Snowpipe-Streaming commit-path serialisation bottleneck.- Channel-count
scaling via
channel_prefix×max_in_flight— exceeding the documented 10 GB/s per-table ceiling.
Seen in¶
- sources/2025-10-02-redpanda-real-time-analytics-redpanda-snowflake-streaming — canonical wiki quantification of the AVRO-over-JSON throughput effect at ~20% on 1 KB randomised payloads, from Redpanda's 14.5 GB/s Snowflake benchmark.
- sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda — Redpanda 9-tips checklist also recommends binary encodings for raw broker throughput; the 2025-10-02 benchmark quantifies the previously-unquantified tip.
Related¶
- systems/redpanda, systems/kafka — streaming brokers where wire-payload size determines raw throughput.
- concepts/compression-codec-tradeoff — the adjacent codec-choice axis; binary encoding and codec compression compose.
- concepts/schema-registry — the out-of-band schema-distribution mechanism that makes AVRO viable.
- concepts/effective-batch-size — smaller per-record
wire size means more records per batch at fixed
batch.size. - patterns/batch-over-network-to-broker — the producer- side batching pattern this composes with.
- patterns/client-side-compression-over-broker-compression — further-downstream compression choice that composes with encoding format.